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A,  THE  BACKGROUND 

Computerized  data  procession  applications  have  nrovn 
over  the  oast  thlrtv  years  to  a  point  where  they  have 
Pecome  a  pervasive  Influence  in  our  society,  ='arlv  data 
rrocesslno  systems  used  magnetic  tane  as  the  orinclnai 
store ae  medium  for  large  data  files.  The  process im  was 
batch-seauentlal  on  a  1oh«by-loh  basis  and  the  an^'llcation 
was  mainly  accnuntlno.  These  systems  had  only  serondarv 
Impact  on  the  operational  asnects  of  the  business, 
early  computers  were  In  sharp  contrast  to  tn*  data 
processing  systems  of  today, 

Tn  modern  data  nrocesslno  systems  many  "ifferent  joes 
can  run  concurrently  with  the  very  large  capacity  on-ltne 
storage  (l,e,,  directly  accessible  without  human 
Intervention),  data-base-orlented  rransactlnn 

processing,  and  apnileatlon  on  mvery  operational  asomet  of 
the  business, 

Ovmr  the  oast  thirty  years,  since  the  first 
vacuum-column  maqnetlc-tipe  transport  in  1953  and  the  first 
movable-head  dlsK  drive  In  J9S7,  tane  and  dlsif  devices  in 
many  conf louratlons  have  been  the  prlnclnal  means  for 
storage  of  the  large  volumes  of  data  renutred  my  this 


Phenomenal  aqqrandlzement  of  data  processlno  evste-ns. 
Maqnetlc  drums  and  other  device  qeometrles  have  also  neen 
Important  system  comoonents,  but  to  a  lesser  •xter't. 
Improvements  In  the  cost,  capacity,  performance,  and 
reliability  of  on-line  storage  devices  fueled  these  oro«lm 
systems  and  their  application  capability. 

As  the  ranqe  of  applications  has  qrown,  a  conrinujm 
concern  nas  been  the  cost  and  access  time  of  data  stora'je, 
A  wide  ranoe  of  technologies  have  been  1  nicest  1  oat eo  •’o 
address  this  challenqe.  As  ranld  as  nroormss  in 
storaoe  technoloov  has  been,  the  need  for  more  caoecltv 
v-ith  improved  access  has  increased  even  faster.  The  use  of 
storaoe  technoloqles  depends  on  three  orlhclPal  factors: 
cost  r*'r  hit#  access  time  and  unit-device  cost,  Tn*  retiucr  i 
cost  oer  bit  In  all  tecnnoiooles  derives  prlmarliv  tro' 
an  Increase  In  the  density  of  the  material  beino  useo  for 
recordlnq.  The  lower  cost  per  bit  is  also  associated  Kit*' 
an  Increase  in  the  Physical  sire  of  t^e  oasic  storat-.o 
unit.  In  low-end  systems,  tne  unlt-oroduct  cost  is  crucial. 
While  the  conventional  recordlno,  (l,e,,  tne  ira'jnetic 
recordlnq)  Is  enterloa  yet  another  chase  et  exolosivo 
growth  In  apollcetlons  and  technojooy  in  order  to  mnet 
these  stringent  requirements,  tne  optical  disxs  nave 
bequn  to  challenqe  the  magnetic  media.  There  are 

Pressures  to  breaK  free  of  the  limitations  of  maqnetlc 
storaqe  where  larqe  volumes  of  data  are  Involved,  These 


pressures  com©  from  the  eontlnulna  growth  of  conventional 
storage,  existing  reaulrements  of  large  corporate 
and  governmental  databases,  and  the  development  ot  new 
aorllcations  such  as  storage  ot  digitized  doc'nents  wher.e 
large  volumes  of  data  must  be  stored  at  low  cost,  Sucn 
anpllcations  often  demand  a  cost,  capacity  and  oertormance 
that  is  difficult  to  achieve  Atagnetlcallv,  Tutic.il 
storage  Is  able  to  provide  performance  tnat  is 
competitive  with  the  performance  of  magnetic  recordincj,  in 
fact,  emerolno  optical  technologies  are  already  caoanlc  of 
reolacinq  magnetic  disks  in  certain  applications,  mwover, 
there  is  no  single  tecnnology  that  is  riant  tor  all 
dDPilcatlons .  Thus,  data  orocesslng  instailatloi»s  often 
have  available  a  wide  ranoe  of  different  storaa* 
technologies.  The  Individual  needs  of  each  aor.Hcdtlon 
must  oe  analyzed  to  determine  the  aporoprlate  teennoloov  to 
utilize, 

h,  TMf.  ORGlNIZlTiad  ht  THF  ThKSIS 

The  Duroose  of  this  thesis  is  to  nxamlnr>  nlqn-voluTp, 
on-line  storage  media  of.  current  and  ererairg  tecrnologlps 
ano  software  tecnnlaues  for  supoortlno  these  on-line,  nlah 
capacity  storage  media.  This  thesis  nas  to  major  warts. 
In  tne  first  part,  we  analyze  sucti  medi-o  as  veifical 
magnetic  recording,  tnin  film  media,  optical  dafa  dlsKs, 
magneto-ootlc  disks,  bubble  and  Bernoulli-ef tect  dis<s. 


comparisons  and  evaluations  of  products  and  oroduct 


Then, 

categories  are  illustrated.  In  the  second  oart,  *e  revle'v 
the  modern  software  techniques  for  on-line  oatanase  storage 
and  access.  Thus,  this  thesis  is  organized  into  t*o  parts: 

Part  I:  viodern  Hardware  Technologies  for  'in-Lin- 
Database  Storage  ano  noeration,  and. 

Fart  II:  *’edern  Software  Techniques  for  On-line  Oatan^s- 
Storage  and  Access, 


OP  the  ijardware,  this  thesis  consists  of  seven  cnaoters. 
Chanter  11  is  on  magnetic  recording.  Chapter  Tii  is  on 
bubbie-memorv  recording  and  Chapter  iv  is  on  vertical 
recording.  Chapter  V  is  on  optical  recording.  Chanter  vt 
is  on  magneto-ortlc  recordlno.  Chapter  VII  is  on  t*o  other 
recording  technologies,  random-access  memory  ano  the 
Bernoulli  box.  The  final  chapter.  Chapter  VIJI,  Is  on  rn- 
technology  comparisons, 

cn  the  software,  tnls  thesis  consists  nf  six  chapters. 
Chapter  X  Is  on  data  abstraction.  Chapter  X]  is  on  data 
access  and  retrieval  methods.  Chapter  xii  is  on  -lata 
comoactlon.  Chapter  XllJ  is  on  data  models  for  storaie. 
The  final  chapter.  Chapter  XIV,  is  on  differential  flies. 

The  last  Chapter  of  the  thesis.  Chapter  xy,  fs  tn- 
conclusion  for  the  hardware  portion  and  the  software  portion 
Of  the  thesis. 


II.  ZiiE  !i&Sil£ZlC  &£CaEfii:ifi 


The  maonetlc  recor^ilnq  consists  of  the  conventional 
recording,  bubble  •nemorY  recording,  and  vertical  recnr'Una, 
The  last  two  recording  technologies  are  to  be  discussen  i" 
considerable  details  In  the  chapters  followed, 

A,  THE  CorJVEMIOMAL  PECORDING 

"Conventional  recording  is  entering  yet  anotner  rnase 
of  explosive  growth  both  in  applications  am  i*' 
technology,"  [Ref,  4J 

For  the  past  thirty  years  the  conventional  ^aynetic 
recording  has,  almost  exclusively,  fulfilled  the  lata 

storage  requirements  of  the  data  processing  con^^ni tv. 
During  that  period  of  time  significant  advances  in 
aspects  of  conventional  storage  technology  rave 
resulted  in  very  significant  operational  gains. 

In  this  section,  conventional  recording  as  o  surface- 
area  technology  is  discussed.  First,  convent in^a ’ 
recording's  operation  Is  examined.  Secondly,  both  fixeg-n-ad 
and  movable-nead  dlstcs  of  conventional  recording  ar«* 
investigated.  Then,  their  technological  Implications  ar»* 
scrutinized.  These  Implications  include  tneir  storage-device 
capacity,  which  Is  a  direct  function  of  the  areai  density  of. 
recording,  the  surface  area  provided  by  the  storage  media. 


and  the  efficiency  of  their  utilization,  r^e  niTher 
storaqe  densities  have  required  Irorovement  in  conventior.dl 


recordlnq  resolution.  which  has  been  achieved  triroaon 
reduction  In,  head«dlsk  spaclnq  ana  in  medium  t^Jc<ness. 
Proqress  in  reducinq  the  head-to-surface  spaclnq  has  oeen 
the  key  factor  In  Increasina  the  linear  density  ot  iis< 
storaqe.  A  boundary  layer  of  air  Is  used  to  provide  .m 
air  oearinq  which  in  turn  determines  tre  soacinq,  r,-? 
oroqress  in  air  bearlnq  technoloqv  (bearinq  deslqn  ana  tr.o 
surface  finish  and  material  prooertles  of  head  and  medium) 
has  reduced  spaclnos  to  ,2S  microns  with  laboratory  sti:dlas 
at  soacinqs  as  low  as  ,1  micron  {one  micron  b  ^ u  to  the  -* 
Inches ) , 

The  track  density  (track  density  *  lineal  density  = 
areal  density)  of  conventional  disks  is  determine-)  ry  rro 
accuracy  and  tolerance  of  tne  head  positioning  mecnanis’' 
and  tne  transverse  resolution  of  the  read-write  near,  as 
long  as  an  adequate  s ional-to-nol se  ratio  is  avaiianie  as 
tne  track  width  is  reduced.  Over  the  same  time  rerlod 
the  track  density  nas  Increased  from  2')  to  almost  Ui  ' 
tracks  per  inch.  The  placlnn  of  the  servo  information  ^itn 
the  data  and  the  utillzinq  ot  better  head-dis<  usse’^niv 
packaging  will  lead  to  further  advances. 

To  date,  advances  in  disk  media  nas  been  made  by  boinu 
to  thinner  and  smoother  coatings  to  imorove  resolution  and 
to  reduce  demagnetization.  These  advances  are  eiahorateo 
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later  in  the  chapter.  Thin  films  are  belno  Dursuert, 
these  films.  It  is  easily  possible  to  produce  medium  layers 
Of  less  than  25  nanometers  Ca  nanometer  =  one  olllionth  of  a 
meter)  • 

In  respect  to  conventional  recording's  efficiepcy,  most 
of  the  Improvements  durlna  this  decade  *111  continue  to  cone 
from  the  Increased  areal  density.  The  track  density  can  re 
Increased  by  reduclnathe  track  widths  anr*  the  linear  iensitv 
can  he  achieved  throuan  contlnuino  imorovenepts  in  recording 
resolution,  as  a  result  of  the  decreased  medium  thic<ness, 
the  reduced  head-nan  and  head-dls<  sraclnq,  at^d  tr>n 
increased  use  of  soohl stlcated  slanal  erncesslno  am 
error-correction  codes. 

1 .  Qaasatioas 

The  Danish  ennineer  Valdemar  Poulsen  exhibited  the 
first  magnetic  recorder  at  the  Paris  Exbosltlon  of  lyno,  Tt 
came  23  years  after  Thomas  Edison  had  built  tne  phonograph. 
In  Poulsen's  device  a  steel  piano  wire  was  colled  on  tne 
spiral  groove  around  the  surface  of  a  orum.  An  elect romaone t 
made  the  contact  with  the  wire  and  was  free  to  slide  aion^' 
a  rod  oelnq  positioned  parallel  to  the  axis  of  the  orurt.. 
The  drum's  rotation  pulled  down  tne  electromagnet,  when  tne 
current  from  a  mlcropnone  passed  through  tr.e 
electromagnet,  a  segment  of  tne  wire  (where  the  contact  was 
made)  was  magnetized  in  proportion  to  the  current.  Although 
Poulsen's  Invention  created  a  sensation,  tne  recorded 


signal  was  weak.  It  was  not  until  the  Invention  of  the 
vacuum-tube  amplifier  in  the  i920's  that  magnetic  recordlna 


began  Its  steady  evolution.  The  piano  wire  evolved  Into 
Plastic  tape  with  a  certain  amount  of  maanetlc  material,  in 
another  configuration  a  rotating  drum  was  coated  with  a 
magnetic  medium  on  which  signals  could  be  recorded  .on 
numerous  circular  tracks,  Kach  track  had  its  0*0 
electromagnet.  Such  devices  became  memories  for  tne  first 
modern  computers  CRef,  1], 

a.  The  Magnetic  writing 

The  magnetic  writing,  the  recording  of  data  In  a 
magnetic  medium,  is  based  on  tne  same  orlnclole  today 
that  applied  in  Poulsen's  device.  If  a  current  flows 
in  a  coll  of  wire,  it  produces  a  magnetic  field,  ’tnus, 
the  magnetic  writing  occurs  as  follows:  Tne  electric 
current  supplied  to  the  head  flows  through  a  coll  around  a 
core  of  magnetic  material.  The  core  throws  a  riiagnetic 
field  into  the  disk,  thereby  magnetizing  the  mealum  ivim 
on  the  cllsK,  l,e,,  wrltlngthe  data  (Ref,  IJ, 

b.  The  Magnetic  Reading 


The  head  that 

writes 

the 

data 

can 

also 

oe 

used 

to  read 

it,  nne  way 

this 

is 

done 

is 

based 

on 

toe 

Principle 

of  induction. 

formulated 

by 

Michael 

Fa 

radav 

in  1831,  In  the  principle  of  induction,  a  voltage  is 
Induced  in  an  open  circuit,  such  as,  a  loop  of  wire,  ►'V  the 
presence  of  a  chancing  magnetic  field.  In  the  case  of  a 


head  positioned  above  a  spinning  magnetic  dlslc  of  which  data 
have  been  written,  the  magnetic  fields  are  originated  froT 
the  magnetized  regions  on  the  dlslc.  During  the  time  the 
head  is  over  a  single  magnetized  region  the  flela  is  more 
or  less  uniform.  Hence  no  voltage  develops  across  the 
coll  that  Is  a  part  of  the  head.  When  a  region  passes 
under  the  head  in  wnlch  the  magnetization  of  the  megiu-n 
reverses  from  one  state  to  the  other,  there  is  a  rani*' 
Change  in  the  field.  Hence  a  voltage  oulse  develons,  in 
this  wav  the  digital  data  in  storage  are  read  as  an  analogue 
signal,  wnlch  can  oe  readily  converted  bark  into  dinltal 
form  tHef ,  1 J . 

2,  Sizad-Uaad  aad  QisJta 

A  magnetic  disk  is  a  direct  access  device  wnlch 
has  read-wrlte  heads  that  can  both  read  and  write  data  m 
the  surface  areas  of  platter-shaped  magnetic  disks,  as 
Illustrated  In  Figure  1,  access  arms  are  used  to  oiace  tne 
read-wrlte  heads  over  the  surface  areas  of  the  rotating 
disks.  ''agnetlc  disks  are  available  in  both  fixea-head 
and  movahie-nead  form, 

Fixed-nead  disks  are  not  removed  from  a  disk  drive 
unit.  Figure  2  depicts  a  side  view  of  a  fixed  nlsw, 
which  consists  of  six  platters  wltn  10  surfaces  ann  in 
read/write  heads  per  surface,  Each  surface  Is  dlvloed  into 
concentric  rings,  called  tracks,  normally,  the 
outermost  surfaces  of  the  top  and  bottom  matters  are  not 


Figure  1.  A  Magnetic  Disk  Drive 


used  for  storlnq  data  since  they  can  be  easily  damaqed. 
Data  Is  recorded  on  the  tracks  by  the  read/write  heads  which 
are  arranged  on  a  read/#rlte  head  comb  assembly  that  is 


r 


fixed  In  Place,  Since  there  is  one  read/arlte  heaa  oer 
track,  no  seek  time  (l,e,,  the  time  associated  ^Itn  tne 
Shifting  of  a  read/write  head  over  a  track  of  aata)  is 
required  to  move  a  read/write  head  to  the  proper  track  on  a 
surface.  This  leaves  only  the  time  for  rotational  delay. 
The  rotational  delay  is  the  time  required  to  wait  tor  tne 
desired  data  to  rotate  under  the  read/write  head  once  tr.e 
read/write  read  is  positioned  over  the  desired  tracn. 
This  provides  faster  access  time  since  access  time  is  the 
sun  of  the  seek  time  and  the  rotational  delav, 

(■Ixed-head  disks  are  normally  used  in  systems  that 
are  either  dedicated  to  one  or  a  few  applications  or  when 
files  are  redtilred  to  be  on-line  with  a  low  access  ti^e. 
Characteristics  of  some  of  the  commercially  avallahin 
fixed-head  disk  units  are  Illustrated  in  Kloure  2a, 

i^ovahle-head  disks  are  more  common  tnan  tlxed-h»a'’ 
disks  because  tne  disk  packs  are  removaole  ana;  since 
there  is  only  one  read/write  head  ner  disk  surface,  tne 
cost  per  bit  of  storage  Is  less,  Figure  3  depicts  a  sloe 
view  of  a  movable-head  disk  with  10  surfaces.  The 
read/write  head  comb  assembly  Is  moved  in  and  out  In  order 
to  access  all  of  the  tracks  on  each  surface,  since  there  is 
only  one  read/write  head  per  disk  surface,  the 
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Manufacturer 

Model 

Burroughs 

B9370-2 

DEC 

R503 

IBM 

2305 

Amcomp 

B530/256 

Surfnces/iinil 

2 

1 

12 

4 

Tracks/surface 

I  IN) 

M 

.32 

128 

•Sector  sire 

l(N)  bylej 

M  bytes 

variable 

Seclors/lrack 

IIN) 

M 

vaiiable 

Track  capacity 

10,000  bytes 

40‘»fi  by  ICS 

l-l.ltb  bytes 

IM)K  bits 

Tnlal  capacity 

2M  bytes 

2f>?K  bytes 

,3  4M  bytes 

7f.  8M  bits 

Average  latency 

17  tns 

8  5  ms 

2.5  ms 

R  .J  ms 

Transfer  rate 

300K 

250K 

3M 

*)M  hits  scconil 

byles/second 

by  ics'secoiul 

byics.'sccoml 

Figure  2a. 


Characteristics  of  Fixed-Head  Disk  Units. 
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read/write  heads  usually  move  toqetner  as  a  unit,  and 
only  one  head  can  transfer  data  at  a  time,  Tnns,  sine* 
tne  comh  assembly  mechanism  moves,  a  laroe  recordlnn 
surface  area  can  he  covered  with  only  a  tew  reari/»rite 
heads.  The  characteristics  of  some  movahie-neac  disks 
units  are  Illustrated  in  Ploure  3a, 

There  is  yet  another  cateuory  of  maonetlc  misks 
Which  is  a  hvorld  of  tne  above  two.  The  «lnchesCer-tv{® 
olsics,  are  celled  fixed-media  direct-access  storage  dls'^s, 
Tnese  are  fixed  disks,  meanim  that  it  employs  a 

nonremovable  sealed  head-disk  assembly,  with  mnvaol e-nmaa 
oisk  units.  In  other  words,  the  comb  assembly,  aithn'icm 
movable.  Is  an  integral  part  of  the  disk  clatters,  Tnus,  oni* 
can  replace  one  assembly  and  its  niatters  with  another 
assembly  and  another  set  of  platters.  These  fixed-me-'la 
disks  were  introduced  by  13m  in  the  earlv  *lLo  tte 

3344,  followed  hy  the  IBM  3350,  and  in  l^•7‘^,  tn-  3i’(  , 
3370  ,  3375  and  In  I'ieo  the  33ftO, 

3,  £ec&Bala8ic4l  laclicdtloDS 

Techno loq leal  Implications  of  conventional  disk 
systems  encompass  tne  tollowlnq  three  salient  features; 

(1)  Material  requirements, 

(2)  Features  and  benefits,  and 


(3)  filmltatlons 


Manufacturer 

Model 

HP 

2100 

IBM 

3330 

IBM 

3340 

CDC 

33801  AZ 

Surfaces  unit 

4 

19 

12 

19 

Tracks/surfacc 

200 

404 

b% 

808 

Sector  sire 

2?6  bytes 

variable 

variable 

— 

Scclorslraek 

24 

vai  table 

variable 

Track  capacity 

M44  bytes 

14.0.10  bytes 

8.168  bytes 

1. 1.0,10  bytes 

Total  capacity 

4.9M  bytes 

lOOM  bytes 

69.9M  bytes 

4tH)M  bytes 

Average  latency 

12.5  nts 

R.-l  ms 

12.5  ms 

16.7  ms 

Transfer  rate 

3I2K 

806K 

885  K 

I.2M  bytes/ 

bytes/'seettnd 

byfcs'scctiml 

bylesrsecrrnd 

second 

Figure  3a 


Characteristics  of  Movable-Head  Disk  Unit 


a.  Material  Requirements 


The  materials  for  the  maqnet Ic  recordlno 
medium  are  arranoed  from  the  too  to  tne  bottom  In  Kiqure  4 
for  the  Oldest  to  the  he<*est  materials  available  tooay.  f'f 
course,  the  writing  and  reading  of  data  deoend  on  the 
maghetlc  properties  of  the  media  in  which  tne  oata  is 
stored.  The  most  common  of  these  Is  the  oamma  forii.  ot  the 
Iron  oxide,  which  Is  currently  In  use  today.  Iron  oxide  is 
desirable  because  Its  properties  are  nest  sultco  tor 
magnetic  recoralng  and  its  cost  Is  very  reasonahie, 
Moreover,  Its  surface  is  uniform  and  homogeneous,  <^oich 
mattes  the  iron  oxide  Ideal  for  recording. 

To  use  this  medium  In  tne  mahufacturinq  of  e 
disk,  the  "Chemical  plated"  process  Is  utilized,  ""he 
chemical  plated  process  Is  a  process  py  which  oamt-liKe 
coatings  of  Iron-oxlde  particles  are  suspenoed  in  e  nolymer 
binder,  such  as  tne  aluminum.  This  aluminum,  disk  Is  coated 
with  a  slurry  containing  tne  iron  oxide,  ine  oxide  In  t.oe 
slurry  consists  of  needle-llke  particles  approximately  a 
rrlcrometer  (Id  to  the  -4  centimeter)  In  length  and  a  tenth 
of  a  micrometer  wide.  The  iron  atoms  in  each  particle  nave 
tnelr  own  minute  magnetic  fields,  but  the  elongated  snao* 
of  the  particle  forces  the  fields  Into  an  alignment 
along  the  particle's  long  axis.  Each  needle  Is 
therefore  a  magnetic  bar,  and  nas  a  dipole  magnetic  field. 
The  only  possible  change  in  the  field  is  a  reversal  of  tne 


north  and  south  poles  at  the  ends  o£  the  needle.  The  overall 
magnetization  In  anv  given  region  of  the  dlsi^  Is  the 
sum  of  the  fields  of  the  needlellKe  particles  <»lthin  It. 

Plainly,  the  magnetization  of  a  region  of  the 
dlsic  would  he  maximal  If  its  needles  were  allonea  and  if 
they  all  had  their  north  (or  their  south)  Poles  taelna 
In  the  same  direction.  The  alignment  of  the  neeoles 
is  achieved  when  the  dlsir  Is  m^pufacture-i,  r>v 
rotatlno  the  disk  In  the  presence  of  a  magnetic  field  oefore 
the  slurry  has  dried.  The  needles  come  to  lie  in  the  mane 
Of  the  disk  and  more  or  less  perpendicular  to  a  racilus  of 
the  disk.  In  an  ooeratlno  disk,  the  needles  are  more  or 
less  aligned  with  the  direction  of  motion  of  tne  dls<. 

The  alignment  of  tne  poles  Is  achieved  vnen 
the  data  Is  written.  Specifically,  It  Is  achieved  when  t.-* 
head  applies  a  magnetic  field  to  the  medium,  ihe  aiagnetlc 
particles  are  sufficiently  far  apart ,  so  that  their  o*<n 
fields  do  not  Interact  appreciably  with  one  another. 
However,  as  the  strength  of  the  applied  field  increases, 
some  of  the  magnetic  oartlcles  whose  dlooles  are 
opposite  to  the  direction  of  tne  applied  magnetic  fleli 
reverse  tneir  dloole  field,  ultimately,  tne  applied  field 
becomes  strong  enough  to  polarize  all  of  the  particles,  two 
complications  must  be  noted.  First,  the  field  of  tne 
nead  falls  off  raoidly  as  the  distance  from  tne  nead 
Increases.  Second,  the  medium  is  moving,  and  It  therefore 


Fe  O  (Uniform  and  Homogeneous) 

Cr  O  (Unsmooth  Surface) 

Cobalt-Iron  Oxides  (Temp  Dependent) 

Barium  Ferrite  (Temp  Dependent) 

Metals  (Unsmooth  Surface) 


Figure  4.  Magnetic  Recording  Materials. 


passes  out  of  the  reqlon  In  which  the  field  Is  stronn 
enough  to  polarize  the  tnedium,  it  is  the  trailing  edge  of 
the  field  that  governs  t^'e  final  orientation  of  tne 
maonetizatlon,  wnen  the  field  of  the  head  Is  removed,  tne 
region  of  polarized  medium  remains.  That  Is  wnv  tne 
data  can  be  storea.  The  magnetization  can  oe  oriven 
back  to  zero,  by  reversing  the  flow  of  the  current 
throuan  the  coll  in  the  head  and  thereby  appivinq  to  tre 
magnetic  medium  a  reversed  magnetic  field,  since 
the  magnetization  persists  in  the  medium,  tne  reversal 
of  the  iraonetic  field  does  not  Immediately  reverse  tn“ 
dipoles  by  which  the  medium  was  magnetized  in  tne  first 
Place  until  the  field  reaches  the  efficient  strength. 

For  a  maonetlc  medium  It  Is  desirable  tnar  the 
remanent  magnetism  (l,e,,  the  magnetism  that  Persists  when 
the  magnetic  field  Is  absent)  be  large.  It  also  is 
desirable  that  a  moderately  laroe  field  strength  be 
present  to  demagnetize  the  medium.  Both  ot  these 
requirements  beio  to  ensure  the  permanence  ot  toe  stored 
data.  In  addition  it  Is  desirable  that  the  reversal  ot  tno 
maonetizatlon  of  this  medium  be  accomrlished  over  a  small 
range  of  applied  field  strength.  This  helps  to  ensure 
that  the  states  of  the  m-edlum  that  are  used  for  data 
storage  win  be  well  defined.  All  four  of  these  criteria 
are  summarized  bv  the  requirement  that  the  hysteresis 


looo  for  the  magnetic  medium  be  laroe  and  nearly  square 
[Ref.  1]. 

In  addition  to  the  iron  oxide,  tnere  are  fqiir 
materials  on  the  horizon  as  candidates  for  maqnetlc  medlurs. 
Chromium  oxides,  cobalt-iron  oxides,  barium  ferrites,  and 
metals.  See  riaare  4,  The  four  medium  materials,  aithcnh 
very  hlqn  potential  for  the  near  future,  are  very  li-^lte-i 
for  current  usage  due  to  their  inherent  disadvantanes  an-i 
hloh  manufacturinq  cost,  Dlsadvartaqes  for  cnromij-' 
oxides  Include  difficulty  In  ohtainlna  smooth  surface 
and  oood  orientation,  PI sadvantages  tor  cotalt-lron  oxides 
and  barium  ferrites  include  the  temperature  dependence  ot 
the  coerclvlty,  Coerclvity  is  the  ability  of  the  tateriai 
to  resist  accidental  and  sel f-maanetizaticn,  Dt  course, 
the  higher  the  coerclvity  the  better  tne  mediutn  Is  for 
maonetlc  recordlm.  Although  this  dependence  can  nw 
reduced  by  varylnq  the  composition  of  its  components,  rne 
last  medium,  metals,  are  tne  most  promlslna,  due  to  tt.elr 
excellent  coerclvity.  However,  metals  are  currently  t^e 
most  expensive,  Metals  also  nave  other 
disadvantages,  sucn  as  a  reduction  in  ragnetism  when  exnose.i 
to  elevated  temperatures  and  humidity. 

Further,  the  four  materials  must  be  manufactured 
by  utilizing  the  "sputterlna"  orocess,  whicn  is  the 
process  by  which  atoms  or  groups  of  atoms  are  ejected  from 
a  metal  surfece.  The  iron  oxide  does  not  need  this  process. 


since  It  utilizes  the  platted  process.  Althousih  the 
SDutterlnu  process  aenerates  a  very  clean  surface  and  has  3 
to  8  times  the  capability  of  the  Platted  process,  it  aoes 
cost  approximately  60%  more, 

Flqure  5  depicts  the  maximum  coercivity  of 
each  of  the  above  materials,  alom  with  the  three  mast 
common  anlsotrophlc  structures  of  medium  materials, 
AnisotroPhy  Is  the  phenomenon  of  a  material  in  wnich  tnerr 
exists  preferred  directions  for  the  mnanetizatlon,  ‘'et.^is 
are  rot  Included  In  the  fiuure  because  there  does  not  vet 
exist  sufficient  Information  for  comparison, 

b.  Features  and  Henefits  of  Conventional  Dls< 

The  features  of  conventional  dis<R  are 
Illustrated  in  Flqure  6,  benefits  are  listed  oeiowj 

fl)  lowest  cost  oer  bit  as  a  read/write  on-line 
storage  medium, 

(2)  a  competitive  marketplace  based  on  numerous 
suppliers  and  a  large  choice  of  oroduct  offerlnos, 

(3)  caoacltles  up  to  gloabytes, 

(4)  read/write  capability  and  nonvolat 1 1 1 ty , 

(5)  broad  environmental  tolerances, 

(6)  relatively  modest  entry  cost. 

Cl)  multl-bllllon  dollar  Industry, 

(8)  established  production  processes, 

(9)  Increase  demand  for  capacity  occurring  faster  than 
storaoe  density, 

(10)  density  still  far  from  ultimate  limits, 
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Development  of  lechnolopes  in  key  tfeis  of  magnetic  head  and  Its  air  beating  lupport.  disk  subsime  and  Its  coaling,  head* 
poiiitoning  actuator,  and  read/write  electronics. 
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(11)  a  wide  choice  of  materials,  and 

(12)  completely  reversible,  inherently  stable  orocesses. 

As  a  recording  type  of  storaae,  conventional 
disks  have  the  advantages  of  non-volatlllty,  lower  cost, 
direct  access,  allocation  flexibility,  a  simple  and  reliable 
recording  process,  and  allowing  update  in  Place.  Treir 
ma)or  disadvantages  are  two:  The  movable-nead  ilsKS  involve 
the  mechanical  motion  of  access  assemply  and  long  access 
times,  tne  fixed-head  dlstcs  incur  higher  proguction  cost, 
c.  Limitations  of  the  Conventional  Fecordlng 

The  conventional  recording  is  limited  hy  its 
physical  density  as  depicted  in  Figure  7,  Figure  7  shous 
that  the  current  density  equates  to  1.2  qlgaoytes  tor  tne 
IBM  3360,  with  the  ultimate  density  equating  to  22, s 

gigabytes.  The  Patty  II  disk  system,  manufactured  nv 
Lational  Telegraph  and  Telephone  Co,,  Is.  a  prototype  and  Is 
to  be  discussed  later. 

The  key  parameters  which  limit  the  linear 
density  are  (1)  the  flying  height  of  the  nead  aoove  tne 
meala,  and  (2)  the  onysical  width  •  of  the  transition 
between  neighboring,  oppositely  magnetized  fields,  Tne 
increased  linear  density  requires  a  balanced  reduction  in 
these  two  Parameters,  and  Is  ultimately  limited  by  the 
failure  to  maintain  the  minimum  bit-error-rate  (HFt?) 
requirement  for  the  storage  device.  As  the  linear 

density  Is  Increased  the  ber  grows  due  to  systematic 
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Figure  7, 


Limitations  of  Conventional  Recording. 


pealc  shift  associated  Mlth  transition  crowijino  alona 
the  traciCf  and/or  reduced  playback  slanal-to-nolse  ratio 
(SMK).  The  SNP  falls  due  either  to  Increased  noise  arlslno 
from  media  granularity  and  surface  rouqnness,  or  signal 
loss  resultlnq  from  demaqnetlzation,  combined  •/itn  the 
reed  to  resort  to  tninner  layers  of  material  In  order  to 
reduce  the  transition  width. 

At  a  qlven  linear  density,  toe  tracts 
density  achievable  in  maanetlc  aata  storage  Is 
fundamentally  limited  by  the  Inductive  nature  of  tne 
magnetic  read  process.  As  the  tractc  wiotn  is  reaucen, 
the  read  signal  voltage  falls  oroportionateiv  ana  tne 
limiting  tracK  density  Is  reached  wnen  the  PiayhacK  S'.k 
falls  to  the  critical  value  required  to  sustain 
acceptable  RFR,  in  practice,  the  achievable  trac< 
aenslty  is  limited  by  the  quality  of  the  radial 
posltloninq  servo  mechanism,  and  the  denree  of  cross-talk 
due  to  the  fringing  flela  of  the  read/write  nead  couniing 
to  adjacent  tracks,  Tne  highest  track,  densities  requires 
developments  in  all  of  these  areas. 

The  linear  density  and  track  density  are 
fundamentally  linked  through  the  Sn«  regujrement 
mentioned  above,  with  the  highest  trac<  density 
corresponding  to  reduced  linear  density  (compared  to  its 
limit)  and  vice-versa.  The  maximization  of  the  overall 
areal  density  requires  an  optlmun  trade-off  between 


these  two  parameters^  dependlno  on  the  media  type  and  the 


magnitude  of  the  magnetization  of  the  storage  layer,  as 
well  as  the  detailed  performance  achieved  by  the  radial 
tracking  servo-mecnanism  [Ref,  2], 

4.  Xsaadft  aad  esadleas.  la  aildesect  ZaxaBaaifi& 

Figures  fl  and  8a  illustrate  conventional  nisk 
trends.  Note  that  in  Figure  8,  the  current  dis<  caDacity 
of  12000  bits  per  Inch  eouates  to  1.3  glaanytes,  and  tn^t 
by  1990,  5  gigabytes.  This  Is  far  greater  tnan  rre 
established  doubling  of  storage  capacity  every  3(i  rontns, 
as  been  the  case  for  the  last  forty  years. 

In  Figure  fla  CPef.  31,  note  tnat  I'lM  is 
experimenting  wltn  the  3380  enhanced  (E),  which  has  a 
storage  capacity  of  2.5  gigabytes  ner  solndle,  which 
doubles  the  3380'$  capacity,  but  Is  far  short  ot  National 
Telephone  and  Telegraph's  (NTT)  Patty  II  caoacltv,  wnicn 
enuates  to  1.07  glqaoytes  oer  head  dlsK  assembly  (duft),  or 
H,f>  gigabytes  per  unit,  which  has  8  MDA's.  Moreover,  ine 
i'*Tl  driver  operates  at  a  rate  greater  than  40  million  Pits 
per  square  Incn,  and  has  a  track  density  of  iHor>  traetfs 
oer  Inch  (TPI),  as  well  as  a  linear  density  ot  2S,4uo 
bits  oer  Inch.  The  data  rate  Is  1.5  meaahytes  ner  secono, 
and  tne  seek  time  Is  12  milliseconds .  this  level  of 
performance  exceeds  that  of  the  inm  3380  (E)  in  storaq" 
density  by  almost  a  factor  of  two  and  in  data  rate  bv  50%, 
Innovations  Include  a  sealed  head  disk  assembly  (HDB), 
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Figure  8a.  Rigid  Disk  Trends. 


w5th  a  helium  atmosphere,  a  thin  film  sputtered  ferrite 
disk,  a  flylnq  heloht  of  0,15  microns,  and  a  unlTue 
rotary  multlactuator  assembly.  Both  the  ISM  33S0  (f)  and 
the  MTT  Patty  II  are  still  only  prototypes. 

Some  of  the  problems  that  must  be  overcome  in  order 
to  achieve  the  anticipated  conventional  recording 
performance  are  cetter  tracklno  error,  positioning  and 
servolno  systems  developments.  They  are  currently  verv 
poorly  developed,  a  very  niqn  deoree  of  accuracy  is 
reaulred  between  the  position  of  the  reaa/wrlte  nea h 
and  the  location  of  data  on  tne  media  surface,  ^ilso,  tr^* 
track  densities  appear  very  unlikely  to  exceed  2ooo  TPi, 


III.  zue  BuaaLC  zEBcaz 


"Bubble  memory#  initially  touted  as  a  universal 
reolacement  for  disk  tecnnoloqv  Is  today  retjarded  as  a 
technological  flash  in  the  ran,"  [Ref.  7] 

The  maqnetlc-oubnie  memory  (.’••BM)  is  a  solid-stare 
maanetic  memory  which  emoloys  shift  reoisters,  these 
Shift  registers  move  magnetic  domains  which  represent  ninarv 
data.  The  rotating  magnetic  fields  of  the  domains  are  nsei 
for  the  binary  orientation  of  the  data.  Unlime  conventional 
semi-conductor  memory  devices  which  are  pronuceci  tro» 
silicon  ir.aterials,  hbm  utilizes  synthetic  garnet  or 
amorphous  materials. 

The  ingenious  technological  discovery  of  dates 

oack  to  1966  when  bell  Laooratory  scientists 
discovered  that  magnetic  bubbles  could  he  useo  to  record, 
store#  and  read  data  by  applying  and  manlpulat  ir.o 
external  magnetic  forces.  The  foiiowino  features  of  tne 
bubble  pnenomena  aldeo  its  development  as  potential  memorv 
devices : 

11)  Bubbles  were  stable  over  a  range  of  the 
magnetic  bias  field  (i.e.#  stable 
storage) ; 

(2)  Bubbles  could  be  elongated  by  lowering 
the  magnetic  bias  field  for  further 
manipulation#  and 


(3)  Bubbles  could  be  annlhlleied  by  ralslna 
the  bias  field, 

A,  CONCEPTS 

Bubbles  are  microscopic  magnetic  cylinders  of 
reverse  oolarization  to  that  of  the  thin  magnetic 
film  substance  surroundlna  the  bubbles  on  a  memory  cntn. 
The  bubbles  are  the  individual  memory  cells,  only  smaller, 
and  hence,  more  densely  paciced  than  conventional 
semiconductor  memory.  The  presence  of  a  ouhoie  Indicates 
a  logic  dlult  of  "l"  and  the  absence  indicates  a  logic  digit 
"0"  , 

Floure  9  depicts  the  technique  for  creating  oubbies.  The 
bubbles  are  created  in  memory  Chios  made  of  two 
layers.  The  first  layer  is  a  nonmagnetic  suostrate  of 
gadolinium  gallium  garnet  about  0,015  Incnes  thlcic,  rn** 
second  layer  is  an  extremely  thin  3  micrometer  ferro¬ 
magnetic  sinole  crystal  of  garnet  grown  on  the  substrate. 
The  single  crystal  completely  covers  a  3  inch  diameter 
wafer  yielding  up  to  44,  1/4  square-inch  buooie  memory 
Chios,  The  magnetic  film  crystal  Is  magnetized  at  rioht 
angles  to  the  surface  so  that  magnetic  equilibrium  occurs. 
Wavy  Interspersed  areas  of  north  and  south  poles  am 
created  in  total  equal  proportions,  wnen  an  external 
magnetic  field,  or  "bias"  field  as  it  is  usually  called,  is 
Imposed  on  the  chips,  magnetic  regions  with  polarity 
similar  to  the  bias  field  expand  and  those  regions  of 


reverse  polarity  shrink  until  they  form  tiny  magnetic 

cylindrical  bubbles.  These  bubbles  are  like  small  Islands 
In  an  ocean  of  oouoslte  magnetism.  In  other  woros.  trie 
polarity  of  bubbles  can  be  either  north  or  south  poles,  but 
are  always  opposite  in  polarity  to  the  bias  field  used 
the  manufacturing  process, 

B.  OPfTRATinNS 

Figure  10  Illustrates  the  operation  of  bubble  memorv 
recording.  Maintaining  and  manipulating  the  buboles 

around  laterally  throughout  the  film  is  a  delicate 
ooeratlon.  The  bubbles  are  stable  within  a  eertai'' 
Intensity  range  of  the  bias  field  created  oy  two  rectangular 
bermanent  maonets,  one  above  and  one  oelow  the  cnlp  to 
develop  the  perpendicular  magnetic  field  which  generates 
and  maintains  the  bubbles.  Above  a  certain  range  tne  oubcles 
collapse  and  dlsaopear,  and  below  this  range  the  bucoles 
expand  once  again  to  form  the  wavy,  stable  ,  an  i 
oppositely  polarized  magnetic  regions,  A  varylm 

electromagnetic  field  created  by  a  pair  of  electromagnetic 
or  orthogonal  coils  wound  around  the  chlo  at  right  angles 
to  each  other  provides  a  rotating  electromagnetic  field 
that  moves  the  bubbles  laterally  along  a  permalloy  trac< 
Whenever  90  degrees  out-of-phase  current  is  fea  to  tne 
two  colls.  The  oermalloy  tracks  are  laid  out  on  tne 
garnet  film  using  prlnted-clrcult  technluues  In  chevron. 


T-bar»  or  semicircle  patterns.  As  the  rotatlno  magnetic 
field  changes  tne  Instantaneous  polarity  of  the  trac< 
elements,  the  bubbles  move  around  the  tracK,  Tne 
Changes  in  polarity  pull  the  bubbles  through  the  cnevroo 
areas  and  down  the  oath,  A  bubble  moves  one  stage  (one 
Chevron ,  T-bar ,  or  semicircle)  along  the  track,  for  eacn 
degree  revolution  of  tne  magnetic  field.  The  ouphie  strea'i 
is  kept  in  motion  oy  passing  "write"  and  "read"  neads  at 
different  points  with  data  belna  read  as  the  buboles  'nai'e  ■=» 
full  revolution  around  the  track, 

A  MBf  cnlp  must  also  contain  structures  caraMe  nf 
oeneratlng,  annihilating,  detecting,  and  rep i.  icat  1  nn 
bubbles,  With  such  mechanisms,  the  basic  functions  tor  a 
memory  can  be  emulated  hy  the  magnetic  bubble  device,  mis 
device  is  the  controller. 

Bubbles  are  generated  by  a  nonmagnetic  conauctor  Ipod, 
called  the  "hairoln",  which  is  Inserted  between  the  oarnet 
fill  and  the  soecial  "oldcax-shaoed"  oermailov 
pTOoaoatlon  track  element  (chevron,  T-bar,  or  semlcirr  ).e ) . 
^hen  a  pulsed  current  passes  through  the  loop,  a  maonetlc 
field  opposite  to  the  bias  field  creates  a  bubble  *njc.n 
is  then  passed  onto  tne  track  by  tne  effect  of  the  rotatin 
magnetic  field.  Changing  bubble  direction  involves  iisinu  tne 
same  "hairpin"  and  "pickax"  arrangement  to  create  fiei-i 
polarities  which  momentarily  block  bubble  movement  caused  ny 


the  rotating 


electromagnetic  field  and  divert  it 


Into 


other  propagation  tracks.  Erasing  olg  buboies  Is 


accomplished  similarly  to  the  method  of  charglna  puDOie 
direction  except  that  instead  of  channeling  a  bupole  to 
another  storage  loop  track,  a  oupble  Is  removed  from  the 
track.  Isolated,  and  erased  by  another  pulse  of  prooer 
polarity  strong  enough  to  to  cause  a  bubble  to  coiiapse. 

Bubble  detection  Is  either  destructive  or 
nondestructive.  In  destructive  detection,  tne  huoole  Is 
detected  and  read,  but  Is  destroyed  by  the  read  orocess 
and  does  not  remain  in  memory  storage.  In  nondestructive 
detection,  the  nubble  Is  detected  and  repilcateor  tne 
replication  is  diverted  to  a  "read"  detector  *nere  that 
bubble  Is  read  and  then  erased,  and  the  original  ouboie 
remains  In  storage,  in  the  nondestructive  read  process,  me 
replicator  basically  splits  a  stretched  bubble  created  oy 
the  "hairpin"  and  separates  the  tv»o  clones,  is  tne 
rotatlno  field  operates,  the  two  Identical  bubbles  follow 
separate  paths,  one  to  remain  in  Tenory  and  tne  otner  to 
pass  on  to  the  detector  and  eventual  erasure.  tne  oubbie 
Passing  to  the  detector  Is  stretched  hundreds  of  times  in 


diameter  and  passed  under  magnetoresistive  material. 
This  conductive  material  has  a  resistance  which  varies 
with  the  strength  of  the  surrounding  maonetic  field,  a 


small  miliiampere 

current  is 

sent 

through 

this  material 

normally 

In 

tne 

chip.  When 

the 

bubble 

Passes  this 

material 

In 

the 

detection 

device 

,  the 

resistance  of 

Is'V 
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the  material  droos  sharply  and  enouqh  current  flows  to 
produce  a  pulse  announcing  the  presence  of  a  eupMe  [wef, 
8] . 

C.  THF  ARCHITKCTUPF 

There  are  basically  two  categories  of  architecture  for 
the  serial  loop  system  or  the  maior-minor  loon  system. 
The  malor-mlnor  loop  system  has  three  imolemeotai  loo 
variations;  transfer  gate,  biocl^  replicator  transfer  ano 
block  replicator  swap, 

Fach  of  these  four  architectures  uses  function  oates  to 
generate,  replicate,  detect,  and  erase  data  and  a  oalr  of 
detectors  to  eliminate  the  effect  of  tne  rotatina 
magnetic  field.  The  serial  loop  scheme  will  oe  mentioned 
onlv  briefly  since  It  is  seldom  employed  (see  Figure  lla). 
The  serial  loop  scheme  consists  of  a  single  serial  loop 
Which  forces  tne  bubble  stream  to  circulate  throughout  the 
entire  loon  oef ore  a  bubble  can  be  read  or  best royeo. 
Access  times  are  typically  high  at  around  370  ms.  Detection 
can  be  destructive  or  nondestructive. 

The  first  scheme  of  major-minor  Iood  bubble  memory 
architecture  Is  the  transfer  gate  system  (see  Figure  Ho), 
The  transfer  gate  system  major-minor  looo  architectures 
are  constructed  with  a  major  loop  which  directly  connects 
to  the  generation,  detection,  replication,  and 
annihilation  devices  for  reaolnn  and  writing  on  one  side  and 


to  a  parallel  series  of  einor  loop.  This  svstem  has  serial 
IhPut.  parallel  storaoe.  and  serial  output.  Minor  loops 
are  connected  to  alternate  olt  positions  along  the  najor 
loop  which  are  all  enclosed  In  a  transfer  gate,  uata  Is 
written  Into  the  major  loop  In  alternate  hits»  shifted 
around  the  major  loop  until  the  first  bit  arrives  at  tne 
first  minor  loop,  the  second  data  bit  at  the  second  minor 
loop  and  so  on  until  each  minor  gate  entrance  holds  a  bit. 
Then  the  transfer  gate  is  pulsed  to  enter  data  into  storaae. 
Old  data  must  be  read  out  serially  from  each  minor  looo  at 
the  respective  minor  loop  exit  before  new  data  bits  can 
occupy  memory  previously  occupied  py  old  data. 

There  is  no  '•write-over"  procedure  available,  Rather, 
Old  data  must  be  destructively  read  before  new  data  car  oe 
entered  Into  the  same  address.  Control  circuitry  ensures  ne'^ 
data  Is  inserted  only  Into  tne  previously  vacatea  .nemorv 
slot  occupied  by  tne  old  data,  when  data  is  only  to  be  read 
and  not  replaced.  It  must  be  replicated  so  tnat  one  copy 
returns  via  the  major  loop  to  the  minor  looo  storage  end  tne 
other  copy  is  read  by  the  detector  and  then  annihilated. 
As  It  may  be  surmised,  the  transfer  gate  architecture  is 
not  fast  enough  for  some  applications  because  of  the 
alternate  spacing  betwaan  minor  loops. 

The  second  scheme  of  major-minor  loop  bubble  iremorv 
architecture  Is  the  block  replicator  transfer  system  (see 
Figure  11c).  More  organizational  separation  of  function 


is  utilized^  resulting  In  a  scheme  wnlch  Is  twice  as  fast  as 


the  transfer 

gate 

version.  The 

major  loop  is  divided 

Into 

two  write 

lines 

at  one  end  of 

the  minor  loop 

banX  and  two 

output  read 

lines 

at  tne  other 

end.  The  minor 

loop 

banx 

is  divided 

into 

even-bit  and 

odd-blt  storage 

oan<s 

with 

each  banx  having 

its  own  generator  and  major 

loop 

wr  i  te 

line.  The 

ood 

banx  has  an  extra  bubble  position  so 

t^a  t 

identical  data  bits  are  offset  from  those  In  the  ev^n  oan^. 
Old  data  is  destructively  read  out  the  same  oaslc  way  as 
the  transfer  gate  system;  however,  in  this  architecture, 
auxiliary  control  circuitry  times  the  rotation  of  nnor 
loops  and  the  transfer  and  replicate  gates  so  tnat  new 
data  properly  replaces  old  data.  That  Is,  new  data  is 
written  only  where  old  data  vacant  slots  are  located.  The 
advantaoe  of  having  the  read  and  write  functions  seoaratei 
is  that  they  have  their  own  dedicated  loop  connections,  as 
soon  as  a  vacant  memory  spot  is  available  on  a  minor  loop, 
new  data  from  the  write  end  of  the  minor  looos  can  no 
entered  Into  the  vacancy.  Consequently,  there  is  no  need 
to  wait  for  outgoing  data  to  clear  the  major  loom 
before  the  arrival  of  new  data.  This  is  in  direct  contrast 
to  the  correSDondino  actions  in  the  transfer  gate  scheme. 

To  simplify  data  read-out,  the  control  circuitrv 
collects  a  bubble  from  each  minor  loop  at  the  read  end 


Of  the  minor  loop  banics  where  the  replicator  gate  for  each 
such  loop  is  located.  The  bloeic  replicator  is  then  oulsea. 


resulting  In  the  replicated  copies  being  icept  In  the 
minor  loop  storage.  as  «ith  the  transfer  gatp 
architecture,  the  block  replicator  architecture  emninys 
nondestructive  detection  for  only  reading  data. 

The  third  and  final  major-minor  loop  bubble  memory 
architecture  scheme  Is  the  swap  gate  scheme  (see  Figure 
lid).  Which  replaces  the  hank  of  transfer  oates  at  the  write 
end  of  the  minor  loop  banks  with  a  bank  of  swap  gates,  inis 
bank  allows  old  data  to  be  transferred  to  wrlte/swap  exits 
at  the  same  time  new  data  Is  available  at  the  swdn/«rlte 
entrances  to  the  minor  loops,  when  tne  swap  oate  is 

energized.  new  and  old  data  mereiv  swap  places, 
data  Is  stored  in  tne  minor  loops  and  old  data  is  wniskeo 
away  by  the  major  write  lines  to  be  erased  by  th- 

annlhllator.  The  obvious  advantage  of  this  scheme  is  that 
a  lot  of  data  does  not  have  to  be  erased  before  new  lata  is 
wr Itter.  This  architecture  also  uses  nondestructive 
detection  for  only  reading  data, 

D,  BUBBLE  MATfcWIALS 

Certain  elements  and  their  alloys  (Fe,  Co,  Ml,  f^o) 
along  with  other  substances  exhibit  the  well-xnown  prooerty 
of  magnetism.  This  property  permits  a  material's  atoms 
to  achieve  a  high  degree  of  alignment  desolte  the 
atoms'  tendency  towards  randomization  due  to  some  type 
Of  thermal  motion.  Materials  can  be  shaped  such  that  their 


direction  of  naqner Izatlon  Is  alonq  a  particular 
direction.  Several  important  properties  of  iragnetlsn  are 
exhibited  when  a  maonetlc  substance  is  subjected  to  an 
external  field.  First,  a  relative  increase  in  the  external 
field  will  cause  a  relative  increase  in  the  substance's 
magnetic  field.  Secondly,  if  a  single,  thin,  crvstai 
film  of  certain  magnetic  materials  is  shared  Perpendicu  lar 
to  the  axis  of  tne  original  magnetism,  the  results  are 
wavy  strips  of  matter  navina  alternating  directions  of 
magnetization  which  are  perpendicular  to  tne  surface 
of  the  film.  Thus,  It  is  the  coirolnatlnn  of  these  two 
properties  which  supplies  an  environment  for  a 

nf  the  avallaoie  bubble  materials,  the  most  coi'm.on  and 
currently  most  utilized  is  a  cubic  structure  aarnet,  which 
Include  rare  earth  (Re),  and  iron  (Fe),  ’'annetlc  garnet 
films  can  easily  be  tailored  to  produce  specific  tiaonetis'- 
along  a  desired  direction,  as  well  as  to  enable  tne 
coerclvlty  to  be  better  controlled.  Also,  sstlsfactorv 
operation  can  he  sustained  with  these  garnets  over  a 
temperature  range  extending  from  room  temperature  ur  to 
70-100  degrees  centigrade,  Moreover,  the  Curie  temperature, 
Which  is  the  temperature  at  which  demagnetization  occurs, 
is  fairly  constant.  This  of  course  provides  useful  buobies. 
The  size  of  bubble  created  can  vary  from  substances  for 

other  films  Include  nexaferrites  (such  as  nafeo), 
amorphous  materials  (such  as  ReTu  alloys),  and 


orthoferrites.  Hexaferrltes  are  hexagonal,  and  thereby 
have  a  crystalline  anistrophy,  which  Is  adeouate  for 
bubbles.  They  represent  a  class  of  materials  with  a 
higher  coercivlty  than  garnets  and  the  capability  of 
creating  smaller  bubbles  (<,5  um).  But,  the  structure 
tends  to  grow  more  raoldlv  perpendicular  to  the  axis, 
thereby  maxing  good  films  more  diftlcult,  Aitnough,  ire 
velocity  is  faster  than  narnets,  its  Curie  te-noerarure  is 
wore  varied.  Also,  at  tnls  time,  since  only  small  bupoies 
can  be  created.  Its  uses  are  uncertain  and  limited. 

Amorphous  films  being  amomhous  are  not  sln^le- 
crvstals-lixe  narnets.  They  are  less  expensive  than 
other  materials,  out  are  too  sensitive  to  variations 
temoerature.  Also,  its  velocity  Is  slower  than  the  other 
materials.  The  size  of  buboles  is  slinhtly  better  than 
hexaferrltes  (,2  to  ,6  um),  but  still  much  less  than 
garnets. 

The  orthoferrites  were  the  first  materials  to  o<» 
utilized  for  I'Bd,  Its  magnetization  Is  much  ton  low  to  re 
verv  useful  and  only  larae  bobbles  can  be  created,  in  tne 
ranoe  of  50«100  um  CRef,  11), 

K,  AO VANTAGES 

The  following  are  some  of  the  advantages  of  ouohie 
memory  over  conventional  semiconductor  technologies  fRef. 


G.  BUBBLE  PRODUCTS 

Almost  every  major  electronics  company  In  the  world  was 
initially  Involved  with  maqnetlc  bubble  research,  Either 
the  technoloay  was  too  complex  or  the  bubble  device  did  not 
hold  enough  business  potential,  many  development  procirams 
were  abandoned.  Thus,  it  is  no  wonder  that  mary  companies 
drooped  altogether.  Only  Intel  remains  In  the  dudoip 
development  field. 

Figure  12  compares  some  of  the  existing  ouooie 
products.  The  first  commercially  offered  product  was  Texas 
Instrument's  ti  02  Kbit  memory  module,  the  llbo^oi, 
in  1978,  It  employed  a  major/mlnor  loop  architecture 
with  157  loops ,  13  of  which  were  redundant ,  il  followed 
this  with  three  higher  capacity  units  which  employed  a 
block  replicate  architecture.  Soon  Rockwell  and  Fujitsu 
also  entered  the  market  with  bubble  devices  of  their  own. 

Early  in  1979  Intel  Introduced  the  first  1-Mplt  device 
on  the  market.  This  device  also  included  all  tne_  surport 
components  to  turn  the  magnetic  bubble  device  Into  a 
magnetic  bubble  system.  These  support  elements  Included  a 
controller,  a  formatter/sense  amoiifler,  a  coil  pre-driver, 
a  coll  driver  and  a  current  pulse  generator.  The 
controller  Interfaces  with  the  microprocessor  system  ano 
converts  microprocessor  read/write  commands  into  tr.e 
necessary  control  signals  to  carry  out  the  the  selected 
operation  within  the  MBB  system.  The  formatter/sense 


0C51CH 

SUPPORT 

POVER-PAIL 

ERROR 

NMWFACTUKR  { 

DEVICE 

CAPACITT 

IfPROACH 

CIRCUITS 

PROTECTION 

CORREaiON 

TR«  1 

rinzoi 

naiT 

CflHPOiefT 

NDIC 

NONE 

NO 

INSTRUnCNTS 

TINIOJ 

234  niT 

COMPONENT 

NONE 

NONE 

NO 

TII2S0 

234  RUT 

COMPONENT 

NONE 

NONE 

NO 

TIIIOM 

1  niT 

COHTRaiER 

ONIT 

NONE 

NONE 

NO 

UOCKVEIL 

Rnasi 

234  RfIT 

COMPONENT 

NONE 

NONE 

NO 

INTERNAriONM. 

RIMII 

t  HUT 

CONTROLLER 
CULT  1 

TES 

NONE 

HO 

NATIOML 

SCHICONDUCTm 

NIN2n« 

234  RtIT 

STSTEN 

FUU 

WILT-IN 

TES 

NllCOtl 

1  HIIT 

STSTEN 

FUU 

WILT-IN 

TES 

iwn 

7110 

1  miT 

STSTEH 

FUU 

WlLT-lN 

TES 

IMCNCTICS 

7114 

4  HIIT 

STSTEN 

FULL 

WILT- IN 

TH 

MTOMLA 

nicfflA 

234  KtlT 

STSTEN 

Fat 

W1LT-1N 

TES 

nicoii 

1  HIIT 

STSTEN 

full 

WILT-IN 

TES 

MA 

I  HIIT 

STSTEN 

full 

WILT-IN 

TES 

FUJITSU 

FIM32IIA 

44  RUT 

lUIILE 

lUIBlE 

external 

TES 

CASSETTE 

CASSETTE 

PNER 

lOAAO  SET 

lOARD  SET 

SEOUENCt 

riNAlPA 

234  RUT 

loinc 

KfSPLE 

EXTERNAL 

TES 

CASSETTE 

CASSETTE 

POVEP 

lOAPD  SET 

lOARO  SET 

SEQUENCE 

Figure  12.  Comparison  of  Magnetic  Bubble  Memory  1983 


amplifier  has  several  functions.  First,  It  contains  two 
sense  amplifiers  for  the  detection  of  bubple  signals 
produced  at  the  detector  output.  Second ,  uoon  system 
Initialization  It  stores  the  redundant  loop  information 
In  an  internal  loop  register  and  Insures  tnat  the  sense 
amplifiers  Identify  the  cori'ect  bits  at  tne  detector 
output.  Third,  this  element  contains  an  error 
correction  mechanism  which  improves  reliability  of  input 
and  output.  The  current-pulse  oenerator  cause  the  control 
signals  to  enable  the  correct  current  sources  for  tne 
desired  operation ,  It  also  Includes  a  power-fail  circuitry 
to  preserve  the  integrity  of  the  data  if  power  Is  sudoeniv 
lost.  Finally,  the  coll  driver  produce  the  high-value 
currents  to  create  the  required  magnetic  fields,  Intel  was 
followed  by  National  Semiconductor  with  a  256-Khit  device 
and  it  too  had  all  the  necessary  suooort  elements. 

Today,  the  only  US  company  that  is  still  involved  in 
this  field  is  Intel,  Intel  has  recently  announced  an 
enhancement  of  its  nlanly  soohlstlcated  oupble  memorv 
controller  (Pec),  One  can  support  up  to  '  an  entire 
megabyte,  and  the  other  up  to  four  megabytes  IRef,  9], 

K,  FUTURF  TRENDS 

Altnough  n6«  panacea  has  disintegrated,  its  future  is 
not  as  bleak  as  expected.  Today,  military  applications 
provide  the  major  need  for  nbm,  Intel  has  recently 


announced  a  4-Mbit  chip  with  the  capability  of  storlno  l 
meqatvte,  and  a  4-neqabyte  capability  is  In  the  near 
horizon.  Figures  13  and  I3a  depict  actual  and  anticipated 
trends  in  the  chip  capacity  vs.  the  year,  ahd  the  orlce/oit 
vs.  the  year,  respectively.  In  Figure  13,  we  note  that  the 
projected  chip  capacity  for  1985  falls  short  oy  6  .-^hlts.  Tr.e 
new  prelection  for  lO  Mbits  is  in  the  1990  time  frame. 
also  note  that  Figure  13a  illustrates  that  the  price  oer  hit 
nas  not  decreased  as  expected.  For  19S5,  the  price  is 
approximately  S.03  oer  bit,  which  Is  S.02  more  than 
projected;  however,  the  trend  is  for  lower  costs. 

As  the  technology  progresses,  the  cost  decreases,  tne 
access  time  (currently,  9h  Mblts/sec)  reduces,  and  capacity 
Increases,  mbm  can  play  a  vital  role  as  a  supoieineht  to 
other  technologies.  Since  numerous  Japanese  companies 
nave  taxen  up  where  dS  companies  dropped  ntf,  tne 
future  remains  optimistic  for  this  technology. 

I,  slimmapy 

Buphle  memory  technoloay,  ax^ihough  it  would  not  be 
the  panacea  that  many  nave  thought,  is  suited  for  certain 
tasxs.  Its  portability  and  reliability  maxe  It  an  loeal 
candidate  for  those  tasxs  where  the  tremendous  speeo  is 
hot  required,  but  rather  the  durable  seryice  over  a  long 
period  of  time  is  required,  such  uses  Include  in  control 
machinery,  in  recorded  messages,  at  remote  sites,  at 
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Bubble  Memory  -  chip  Capacity 


Places  where  minltnum  maintenance  costs  are  desired,  in 
Instances  where  vital  information  cannot  be  risked,  and  in 
memory  cassettes  or  devices  which  must  be  transferred 
over  distances* 


Military 

uses  of 

MBH 

do  exist. 

Intel  is 

devot J  na 

considerable 

research 

and 

develooment 

effort  in 

hhM ,  for 

military  usage.  It  has  experimented  with  ennancim 
temoerature  variations  from  to  85  degrees  In  ceriti':ir3de 
for  operational  uses,  Moreover,  when  the  cost  Is  reouced 
and  tne  access  time  Is  improved,  many  industrial  uses  mav 


IV.  XUE  EEEZIC&L  aECQBaXEC 


The  development  of  the  magnetic  recording  has  oeen 
history  of  pursuing  higher  recording  density.  The  high 
density  recording  is  precisely  the  goal  of  vertical 
recording.  Several  research  efforts  toward  the  vertical 


recording  took  Place  durlnc  the  late  1950's.  Since  the 
desired  performance  had  not  been  achieved,  (i.c.,  tne 
performance  of  the  vertical  recording  could  not  match  nr 
exceed  the  conventional  recording),  the  vertical  research 
was  abandoned. 

in  the  early  I970's  Professor  Iwasakl  and  his 
coworkers  at  Tohoku  University  discovered  that  the  nioh 
density  recording  Is  inhibited  by  the  well-known  effects 
Of  the  recording  demagnetization.  This  lead  to  the 
renewal  of  research  of  a  practical  method  of  vertical 
recording.  Systematic  research  on  the  vertical  recording, 
however,  did  not  start  until  1975,  and  by  1993, 
aonr 0x1 Mate ly  140  reports  on  vertical  recoral  no  havn 
been  ©resented  In  the  related  field.  This  eight-year  period 
have  seen  a  slow,  but  steady,  elevation  of  tnis  sunlect  to 
the  rank  of  major  research  on  magnetic  recordlno,  ihls  trend 
Is  expected  to  Intensify  In  the  future  tRef,  121, 


1  s 


A.  CONCEPTS,  CPERATIONS  AND  CHARACTERISTICS 

The  concepts  and  operations  of  tne  vertical  recordim 
are  similar  to  the  conventional  recording,  Tne  only 
difference  Is  that,  tne  key  to  this  new  method  lies  in 
maonetlzlng  the  disk  surface  material  at  rlnht  angles,  l.e., 
at  angles  vertical  to  tne  surface.  In  contrast,  th^ 
conventional  recording  creates  magnetized  zones  along 
the  surface,  *»lth  the  vertical  recording,  nianer 
recording  densities  now  span  the  depth  ratner  than  tne 
length  of  these  magnetized  regions.  Conseouentlv,  tne 
raising  of  tne  recording  density  no  longer  Aiorsens  tom 
demagnetizing  effect.  In  fact,  the  opposite  Is  true.  This 
effect  is  explained  In  the  followlno  sections.  Recause  tne 
recorded  magnetic  zones  are  vertical  to  the  disk  surface, 
higher  densities  now  squeeze  their  ralstilne 
dimensions,  rather  than  their  length  (see  Figure  14), 

b.  THE  ARChiTFCTURE 

Figure  15  depicts  the  vertical  head  helna  utilized  todav 
in  the  vertical  recording.  It  consists  of  a  main  ooie  made 
Of  a  thin  magnetic  film,  which  Is  less  than  1  urn  thicx, 
placed  vertical  to  tne  disk  surface,  and  an  auxiliary  pole 
made  of  a  thick  ferrite  film  and  located  on  the  other  side 
of  the  recording  medium.  On  the  tip  of  the  auxiliary  sole  is 
a  coll,  which  is  used  for  reading  and  writing.  The  gan 
between  the  these  two  magnetic  poles  Is  less  than  lOO  um. 


Reading  Is  oerformed  as  the  current  In  the  coll 
Induces  a  concentrated  magnetic  flux  on  the  main  noie. 
This  flux  Is  shown  by  dashed  lines  In  Figure  15.  in  the 
writing  process*  meanwhile,  the  magnetic  field  of  tne 
medium  magnetizes  the  main  pole  and  Induces  a  voitaoe  in 
the  coil. 

This  head  Is  characterized  by  a  strong  Interaction 
between  the  main  Dole  and  the  magnetic  layer  of  tne  medium. 
This  operation  is  carried  out  by  the  concentration  ot  tor 
magnetic  flux  from  the  main  pole  Into  the  magnetic  layer 
of  the  medium,  Conseguent ly ,  only  the  vertical  Magnetic 
field  on  the  tip  of  the  main  pole  becomes  significantly 
strong,  in  addition,  as  the  width  of  tne  vertical  flei-i 
Is  governed  only  by  the  thlcKness  of  the  main  pole,  .a 
purely  vertical  magnetic  field  Is  always  applied  to  the 
medium  regardless  of  the  recording  level. 

In  conjunction  with  the  above  vertical  head  a  double 
layer  medium  Is  useo.  This  Is  done  to  enhance  the  reading 
and  writing  process  tenfold.  The  magnetic  Interaction 
between  the  main  pole  ana  the  magnetic  layer  of  the  .lediuT. 
is  therefore  much  enhanced, 

C.  THE  VERTICAL  lEOIUJ^ 

Although  the  same  medium  materials  as  in  the 
conventional  recording  can  be  utilized,  cobalt  chrome 
(Co-Cr)  film  is  best  suited  for  this  type  of  recordlno,  Co- 


Cr  film  has  a  wide  range  of  variations^  which  are  not  fo'ind 


In  the  other  materials.  First,  Co-Cr  film  has  the  largest 
vertical  anisotropy.  Secondly,  both  Co  and  Cr  are  soluble  in 
a  composition  where  the  ferromagnetism  may  aopear,  maicino 
them  more  controllable  with  a  cohesiveness  from  IOO-2200 
Oe,  Finally,  the  Co-Cr  film  has  the  distinctive  feature 
that  It  Is  composed  of  closely  packed  columnar  particles. 
These  particles  are  physically  small  enouch  an-i 
sufficiently  Independent  of  one  another  magnetically  to 
permit  the  uitra-hloh  density  recording,  mis  columnar 
structure  Is  not  found  In  other  medium  materials,  rnus, 
Co-Cr  double  layer  film  is  currently  the  leading  candidate 
for  the  vertical  recording  medium  [Ref,  12}, 

D.  PROPERTTFS  OF  THE  VERTICAL  RCCORDING 

In  the  vertical  recording,  the  adjacent  magnetized 
regions  are  In  antl-parallel  states?  thus,  an  attractive 
force  exists  between  each  pair  of  residual  magnetization 
regions,  maxing  them  stable,  Tiierefore,  a  snaro 
magnetization  transition  (that  region  that  is  subject  to 
demagnetization)  can  be  obtained  even  in  the  nigh-densitv 
recording  without  being  affected  by  the  demagnetization. 
There  is  no  limitation  due  to  the  demagnetization  imposed 
on  the  recording  density  for  the  vertical  recording  [Ref, 


13].  This  hlgh-denslty  recording  can  be  achieved  slmoly  by 
using  a  thinner  main  pole. 


E,  AhVAMTAGES  OF  THE  VERTICAL  RECORDING 

With  the  prospects  of  the  vertical  recording  becoming  a 
reality,  there  is  a  great  deal  of  discussions  on  toe  future 
of  the  conventional  recording.  The  vertical  recoraim 
offers  the  following  advantages  as  compared  to  tne 
conventional  recording: 

Cl)  greater  linear  density  (the  vertical  recording 

has  100,000  flux  reversals  per  Incn,  as  cotipared 
to  ISiOOO  flux  reversals  per  Inch,  for  the 
conventional  recording), 

(2)  greater  areal  density  (the  vertical  recording 

has  10  to  the  10th  flux  reversals  per  square  Inch, 
as  compared  to  16&  times  10  to  the  6th  flux  reversals 
per  souare  inch  for  the  conventional  recording), 

(1)  thlclcer  medium  (for  vertical  recording,  the  medium 
may  be  thicker  than  tne  ones  for  the  conventional 
recording,  since  bits  are  recorded  vertlcallv  to  the 
medium) , 

(4)  reduced  demagnetization  (as  the  lambda  gets  snorter 
for  the  vertical  recording  as  depicted  in  Figure  i », 
the  adjacent  regions  are  in  close,  opposed  fields, 
making  demagnetization  difficult,  whereas  in  the 
conventional  recording,  the  adjacent  regions  are  still 
far  apart  in  onposed  fields,  making  riemaonet tzatton 
easy),  and 

(5)  small  transition  length  (It  is  so  small  tnat  It  is 
close  to  zero,  for  vertical  recording), 

F,  DISADVANTAGES  OF  THE  VERTICAL  RECORDING 

The  future  development  of  tne  vertical  recordlno 
will  require  extensive  Investigations  on  new  heads  and 


media.  Only  by  developing  new  needs  and  decreasing  the 


cost  to  manufacture  media  for  the  vertical  recordlnq,  can 
we  fully  exploit  tne  successful  application  of  this 
technology. 


G,  FUTURE  TRENDS 

Vertical  recording  is  being  developed  mainly  oy  an 
alliance  of  Japanese  Industry  and  universities,  Tn  thxs 
Country,  the  ’’agnetlcs  Research  Laboratory  at  tne 
University  of  Minnesota  is  seriously  pursuing  tne 
potentials  of  this  new  technology.  The  Vertlmaa  svstPTis 
Corporation  Is  tne  only  company  in  the  united  states 
reportedly  Involved  In  the  vertical  recording.  The  Jananese, 
on  the  other  hand,  have  a  massive  effort  oolng  on  In  tne 
vertical  recording.  In  19H2,  the  first  Internat tonal 
Symposium  on  the  Vertical  Recording  was  sponsored  in  Jar-an, 
Cf  the  23  papers  presented  on  this  topic,  only  tnree  were  oy 
J,S,A,  authors  and  all  three  were  from  vertlmag,  rn»  other 
20  were  oy  Japanese  authors, 

virtually,  every  well-Fnown  Japanese  electronics 
comoany  Is  worKlng  on  this  technoloay.  These  companies 
include;  Hitachi,  Toshiba,  Fujitsu,  Nippon  tiectric 
Company,  Sony,  '•atsushlta,  and  a  number  of  smaller 
organizations.  The  announcement  of  a  3  1/2-lncn, 
vertically-oriented  prototype  floppy  olsie  in  t9R3, 
represents  the  level  of  Japanese  achievement  and  dominance 


in  this  field.  They  anticipate  production  now. 


Once  a  medium  Is  available  at  a  mass-production 

price  and  the  technology  Is  well  understood,  there  will 
be  a  rapid  movement  Into  this  field  by  companies  In  the 
U.S.A..  The  rate  of  development  and  market  penetration 
Is  likely  to  be  constrained  tor  the  near  future, 
oecause  of  the  slow  and  expensive  process  to  fahricate 
the  media,  the  large  capital  Investment  for  the  srutter 
system,  and  requirements  for  a  new  tyoe  head.  It  Is  expocte  i 
that  the  vertical  recording  may  complement  the  conventional 
recording  for  at  least  the  next  ten  years, 

vertical  recording,  as  the  Japanese  have  already  reallzen, 
represents  the  next  level  of  maanetlc  recordino  technoloav 
for  the  not-too-dlstant  future. 


V.  lue  QEXICAL  &CCQ&&Ikfi 


A,  AN  INTBnOUCTION 

Tn  today's  society,  the  expansion  of  our  icnowledoe  has 
oenerated  data  In  ever-lncreasinq  volumes  and  qlven  rise  to 
the  need  for  their  efficient  lonq-term  storaqe,  storJnn 
(wrltina)  these  data  reaulre  economical,  compact,  and 
high-speed  mass  memory  systems.  Retrieval  (readingj  of  ttiese 
data  require  the  random-access  capability  to  the  selected 
oata. 

nver  the  years,  the  manufacturers  of  conventional 
storage  devices  have  been  able  to  Increase  storaqe 
capacities  to  keep  oace  with  the  qrowth  in  oata  storage  ann 
retrieval  requirements.  However,  even  more  dramatic 
advances  In  storage  capacity  are  needed  to  satisfy  these 
newly  emerqlnq  requirements.  Although,  the  conventional 
recording  has  much  room  for  future  growth,  l.e,,  anubllnq 
its  capacity  every  30  months,  it  Is  an  evolutionary 
developments,  rather  than  dramatic  leaps  In  increaslnn 
storaqe  capacity. 

An  attractive  new  technology  to  satisfy  tnls  hlon- 
capaclty  data  storage  needs  may  be  the  optical  recording, 
which  malces  use  of  a  highly  focused  laser  beam.  Research 
and  develooment  of  this  hioh-denslty  optical  data  storage 


actually  began  over  20  years  ago,  with  the  Invention  of  the 
laser.  The  term  laser  is  an  acronym  for  light  amplification 
by  stimulated  emission  of  radiation,  or  a  light 
amplifier.  The  process  of  stimulated  emission  can  oe 
described  as  follows:  wnen  atoms,  ions,  or  molecules 
ansorb  energy,  they  can  emit  light  spontaneously,  as  in  tne 
case  of  an  incandescent  lamp.  A  light  wave  mav  oe  usee  to 
stimulate  the  emission.  Thus,  the  stimulated  emission  is 
the  opposite  of  stimulated  absorption,  where  unexcitei 
matter  is  stimulated  into  an  excited  state  oy  the  liant 
wave.  If  a  collection  of  atoms  is  orepared  so  that  more 
are  initially  excited  than  unexcited,  then  an  incident 
light  beam  stimulates  more  emission  than  absorption,  and 
there  is  the  net  amplification  of  the  incident  liaht  beam. 
This  is  the  way  that  the  laser  amplifies  [Ref,  141, 

frlke  the  conventional  recording,  tne  optical  recording 
encompasses  a  family  of  configurations  that  address  tne 
many  reaulrements  of  data  storage  users.  In  the  optical 
storage  technology  three  configurations  exist:  read  only, 
write  once,  and  erasable  recording.  Principles  of 
operations,  the  architecture,  applications,  technological 
implications,  media  types,  the  capacity,  the  cost,  future 
trends  and  Problems  of  tne  optical  recording  wHi  oe 
discussed  in  the  following  sections. 


B.  PRINCIPLES  DF  OPERATIONS 


Figure 

16  depicts 

the  write 

and 

read 

operation 

for 

optical 

recording. 

First, 

the 

process 

for 

the 

write 

operation 

Is  reviewed. 

In  the 

write 

operation. 

the 

drive 

focuses  a  hlgh-power  laser  beam  on  the  underside  ot  the  dis< 
and  into  the  preformed  track  (pre«embossed  data  oitsU  Th<» 
beam  passes  through  the  disk  substrate  (i.e,, 
polvmethvimethacrviate,  pmna,  which  is  an  absorbing  layer), 
and  strikes  the  tnin  metal  coatina  (l,e,,  aluminum 
reflective  layer),  and  heats  the  coating.  The  coatlnn 
consequently  becomes  soft.  The  heat  energy  is  tnen 
transferred  to  the  pnma  substrate  which  generates  gases  wnen 
the  substrate  has  been  heated.  These  oases  push  up  on  tne 
metal  layer  to  create  ouboles,  whlcn  are  aporoxlmately  ,h 
urn.  Thus,  data  Is  recorded. 

The  process  for  reading  data  Is  more  straightforward  and 
simple.  In  the  read  operation,  a  low-power  laser  oeam 
detects  the  presence  of  bubbles  by  measurlno  the  changed 
Intensity  of  the  reflected  light  from  the  disk  surface. 
Thus,  data  can  be  read, 

C,  THE  ARCHIFECTUPE 

Figure  17  Illustrates  a  simple  optical  disk  memory 
architecture.  It  employs  the  laser  light  to  write  data  by 


burning  holes  In  the  medium  on  a  spinning  disk.  The  laser 


Is  used  both  for  reading  and  writing.  Only  the  intensity  of 
the  laser  beam  is  different. 

The  optical  dlsic  architecture  worics  as  follows.  First, 
the  laser  emits  a  beam  of  coherent  light  that  Is  oroicen  bv 
a  diffraction  grating  into  essentially  three  parallel  oeams. 
Two  of  the  three  beams  are  later  used  for  detecting 
tractclng  errors.  The  third  beam,  which  Is  the  strongest  of 
the  three,  Is  the  main  reading  beam.  These  tnree  beans, 
moving  alongside  each  other,  are  then  tocuseo  oy  a 
collimating  lens.  The  oeams  then  pass  through  e  soeclal 
Wollaston  prism  or  polarlzino  beam  splitter  (PtJS),  wnicn 
allows  the  vertically  polarized  projection  beams  to  oass 
directly  through,  out,  separates  the  reflected  lignt.  The 
projected  beam  continues  through  a  guarter  waveleoutn 
retardation  plate,  which  brings  the  light  oaci?  into 
focus.  This  changes  the  polarization  characteristics  of 
the  beam  which  Is  then  directed  by  a  tracicing  mirror 
and  finally  focused  onto  the  dlsK  by  the  nojectlve 
lens,  thereby  allowing  writing  or  reading  to  occur. 

If  the  process  is  to  read,  on  the  return  trip,  the 
reflected  light  retraces  the  path  to  tne  retardation  plate. 
This  modifies  the  polarization,  allowing  the  nrlsn  to 
bend  It  at  right  angles  to  the  projected  beam  and  prevent 
any  type  of  feedback  Into  the  laser,  Tnen,  the  cylindrical 
lens  focuses  this  separate  reflected  beam,  which  falls  on  a 
photo  receptor  array,  which  In  turn  Is  composed  of  photo 


Laser  Read  y 
Beam 


diodes.  The  function  of 


the  photo  receptors#  «(hlch 


control  tracking  and  focus.  Is  to  read  directly  tne 
variation  In  beam  Intensity,  which  encodes  the  digital 
data  on  the  dlsic. 

The  two  weaker  tracking  beo.ns  and  the  primary  laser  are 
focused  by  tne  objective  lens  on  three  different  soots  on 
the  disk  (see  Figure  17a),  The  intensity  of  tne  two 
reflected  tracking  beams  is  compared  by  separate  areas  of 
the  receptor  array.  Differences  between  rneir  are 
interpreted  as  tracking  errors,  which  are  corrected  bv 
the  tracking  mirror.  On  tne  other  hand,  tne  focus  Is 
controlled  by  detecting  changes  in  the  shape  of  the  nrimary 
beam.  When  the  disk  Is  in  focus,  the  cylindrical  lens  »>lil 
project  the  reflected  beam  as  a  circle  on  the  array  of  four 
photo  diodes  (see  fldure  I7b),  When  the  disk  moves  closer  or 
further  from  the  objective  lens,  the  projection  becomes 
elliptical,  with  more  light  falling  on  one  diagonal  pair  of 
receptors.  This  difference  is  detected  as  focus  error  and  a 
servo  mechanism  adjusts  the  oojectlve  lens  (Ref,  151, 

D,  applications  of  optical  HeCORniUO 

Applications  for  optical  recording  are  similar  to 
those  of  tne  conventional  recordlna.  The  difference  is 
that  the  optical  recording  has  greater  capacity  and  lower 
cost.  However,  tne  optical  recording  has  a  longer  access 
time.  Also,  there  are  other  advantages  and  disadvantages. 
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Which  are  discussed  later.  The  followinc  are  some 
aoplications  of  the  optical  recordino: 

(1)  archive  applications. 

(2)  reference-file  applications, 

(3)  backup  for  conventional  disk  files, 

(4)  collections  of  large  sets  of  raw  operational 
data , 

(5)  large  relatively  stable  conventional  files 
previously  saved  on  conventional  disks, 

(6)  file  versions  or  snapshots  of  files, 

(7)  very  hlqh-denslty  storage, 

(P)  removable  media, 

(9)  large  capacity  oer  media  unit, 

(10)  permanent,  nonerasable,  nonmodlf lable  storage, 

(11)  fast  sequential  data  recording  capabillcv, 

(12)  fast  sequential  data  retrieval  capability, 

(13)  moderatelv  fast  nirect-access  data  retrieval 
capability, 

(14)  high  level  of  data  integrity,  and 

(15)  low  cost  of  or-llre  storage. 

Figure  is  Illustrates  five  specific  applications  of  the 
optical  recording  [Ref,  16), 

e.  CURRENT  OPTICAL-RECORDING  STATUS 

Figure  19  deoicts  the  three  categories  of  o'ptlcal 
recording  as  well  as  their  capacity,  applications,  and 
media  and  drive  costs  [Ref,  17),  Figure  19a  Illustrates  tne 
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Figure  17a.  Tracking  Error  Detection. 
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two  types  Of  read-only  optical  data  storage  disks  currently 
available,  along  with  their  characteristics. 

The  CD-ROM  (compact-disk,  read-only-memory)  disk  can 
now  be  manufactured  in  quantity,  and  thus,  can  bacome  a 
medium  for  the  use  of  large  textual  databases,  Tneir  use  can 
obsolete  such  references  as  telephone  directories,  la*/ 
libraries,  medical  histories,  book  references,  and  library 
catalogs.  Archival  dataoase  services  can  sell  their  complete 
historical  data  on  a  few  CD-ROM  disks. 

Unfortunately,  applying  compact  disk  technology  to 
comouters  is  not  as  simple  as  one  may  think,  une  of.  the 
greatest  hurdles  is  standardization.  Although  the  aata  on 
CD-ROMs  Is  organized  in  a  standard  way,  a  standard 
hardware  interface  between  the  players  and  personal 
comouters  has  yet  to  emerge,  A  hardware  Interface 
standard  is  essential,  because  audio  CD  Players  are 
designed  to  transfer  data  serially,  while  most  personal 
comouters  use  a  parallel  scheme  for  communication  i^lth 
disk  drives.  Settling  on  a  standard  hardware  interface  win 
also  allow  tne  creation  of  the  operating  system  tor  CD-pd’’, 

The  Small  Computer  Systems  Interface  (SCSI)  is  one, 
though  not  the  only,  proposal  for  standardization,  it 
Is  based  on  the  Shuggart  Associates  System  Interface 
(SASI),  which  is  already  used  for  hard  disks  In 
personal  computers.  Other  proposals  Include  the  iCCf- 
4R8  bus  and  high  speed  RS-232  serial  transfer. 
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The  following  applications  are  (uore  suited  to 
optical  storage  than  to  magnetic  because: 

Magnetic  Oisfc  -  S/ieegabyte  too  ntgn 

•  Volume  of  data  too  large 
-  Not  portable 

Magnetic  Tape  -Physical  storage  space  too 
large 

-Capacity  per  reel  too  low 
-No  direct  access 
‘Media  life  too  short 

Optical  Storage  -very  high  capacity 
•Low  S/megabyte 
•Direct  access 
-Portable 
-Long  media  life 


Appluatun: 


Appi Icatlon: 


Benefits: 


Application: 


Application: 


APPLICATION  1 

Eitremely  large  quantities  of 
digital  data 

Energy  Exploration 

-  Seismic  data 

-  well  Information 

-  Satellite  data 

-  Protect  data;  value  Increases 
through  time 

-  Increase  productivity  of 
technical  staffs 

-  Efficient  decision  making; 
Increase  profit 

APPLICATION  2 

Storing  and  retrieving  Images 
produced  by  nuclear  and 
diagnostic  medical  equipment 

Medical  *  Patient  information 

-  Diagnostic  procedures 

-  Protect  data:  x-rays  and  tests 

-  New  diagnostic  methodologies 

-  Accurate  decisions:  life  saving 

APPLICATION  1 

Storing  and  distributing  large 
reference  files 

(Books.  Periodicals.  Catalogs. 
Abstracts) 

Libraries  -  University 
•  law 

-  Petal!  catalogs 


storing,  letrievinj  jnJ 

distributing  Imi jrs 

(Naps  and  Engirxeriog  drui^tiigs) 

Munut  iCturing.'ud«v>  nnirnt 
Topcgrapnic  maps 
Orai^lngs 
•  ftujd  maps 
weatner  maps 

-  Protect  data;  track  ihange 
through  time 
Distribution 

-  Cost  reduiiion 

APPllLAliUN  s 


Application:  Offtce  aut&mation  -  Domment 

storage 

Industry:  AH  single  electronic  copy 

-  Electronic  file  cabinet 
*  f lectronic  nu  11 


Reduce  cost,  replace  paper 
Increase  productivity 
Efficient  decision  making 
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-  Protect  data:  case  histories, 
abstracts 

•  Distribution;  mall  platters 

-  Increase  efficiency 


.'A/.V- 


Figure  18.  Five  Applications  of  the  Optical  Storage 
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Read-Only 

Read/Wtile 

Wcite-Once 

Erasable 

Madia  Type 

Factory  replicated 
plastic  disk  with 
embossed  surlaca 

Various  thin  film 
metal  or  organic 
materials 

Magneio-opiic  or 
phase-change  thin 
film  materials 

Media  CapacAy* 
Both  Sides 

30  cm  Disk 

1  hr  continuous  video 
100.000  video  frames 
t  hr  digital  audio* 

2-a  GO  Data 

2-8  G8  Data 
2OK-I0OK  A4  doc. 

1-4  GB  Data 

Applications 

Consumer  eniertainmonl 
(Idiicaiion/Irnining 

Progr.-tm  distribution 
Database  distribution 
Videogame  ROM 

Document  storage 
Archival  dalabase 
(tape  replacement) 
On-line  mass  storage 
(iuke*box) 

High  capacity,  low 
cost  store  lor  small 
systems 

Media  Cost  (t) 

$2-l0/GB 

$10-S0/GB 

S10-S0/GB 

Drive  Coal  (H 

t0.S-SK 

$5-20K 

SS-20K 

*12  cm  disk.  1  side 


Figure  19.  Classification  of  Optical  Data  Storage 


1.  Digital- Audio  Disk  (DAD.  1983) 


...  120  mm  Diameter  Disk  -  1  Hr.  Play 

...  Sony  /  Philips  Format  Standardization 

...  First  High  Volume  Product  for  Optical 
Technology 

...  HER  <  10 

2.  Digital-Data  Disk  (CD-ROM,  1984) 

...  Based  on  Digital  Audio  Disk  Technology 
550  MB  Disk  Capacity 

...  Sony  /  Philips  Format  Standardization 

...  Playback  Unit  Price  :  $1500.00 

...  Access  Time  <  3s,  BER  10 

...  Extendable  to  Low-End  Read/Write 
Systems 


Figure  19a. 


Two  Types  of  Read-Only  Optical  Data  Storage 


Floure  19b  Illustrates  the  current  wrlte-once  optical 
data  storaae  products,  alono  with  their  characteristics  an j 
the  companies  who  manufacture  them,  in  January  of 
□ptotech,  Inc,,  of  Colorado,  will  introouce  (writ'? 

once,  read  mostly  memory),  for  personal  computers,  top 
ability  to  write  data  on  the  disk  once  more  ^itbln  trp 
computer  is  the  difference  oetween  and  CO-K'Ji,  Thus, 

once  data  has  been  written,  the  device  necomes  a  read-or.jv 
device.  These  devices  are  currently  under  develoonent,  ani 
are  to  be  introduced  in  the  near  future  into  tne 

marketplace, 

WPPM  will  be  used  for  internal  databases  such  as  end- 
of-year  financial  data,  inventories,  customer  lists, 
parts  lists  and  other  laroe  collections  of  data 
developed  within  a  personal  computer,  Tne  Uotntech 
5984  is  the  WORK  drive  deslaned  to  interface  to  tne 

oersonal  computers.  Its  aouble-slded  4no-meqaDVt»  ols< 
offers  000  megabytes  of  on-line  storage.  The  cost  of  '(dt  : 
When  volume  production  begins  is  about  the  same  as  a  li- 
meqabyte  winchester  drive,  representing  a  five-fold  uecrease 
in  the  cost  per  stored  bit  of  data  Caoproximately ,  s,lo  oer 
megabyte)  [Ref,  19] , 

This  write-once  data  storage  disk  is  approximately  one 
and  one-half  years  ahead  of  the  multlple-wrlte  optical  data 
storaae  disk.  Recently,  verbatim  Inc.  of  Sunnyvale, 
California,  announced  the  successful  completion  of  the  first 
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of  three 

development 

phases 

of 

a  muitlple-wrl te 

optical  data  storage 

disk. 

The 

development  of 

acceptable 

media 

could 

lead 

to 

relatively  rapid 

Introduction 

of 

this 

type 

disk. 

The  anticipated 

Introduction 

date 

is  early  19B7,  The 

major  candidate  is 

maqneto-optlc  recording,  whic^  Is  discussed  in  tne  next 
Chapter. 

F,  MATERIAL  REQUIREMENTS 

The  desian  and  fabrication  of  optical  data  stnraae  ^erjia 
are  seen  as  the  most  critical  factors  In  determining  tno 
ultimate  usefulness  of  hiqn-density  orttcai  data 
storaoe.  For  data-processina  applications,  this  tact 
reflects  tne  current  status  of  all  tnree  media 
classification,  although  to  different  degrees. 

This  optical  media  must  deal  not  only  with  the  generic 
Issues  of  high-density  media  characteristics  such  as  media 
resolution,  noise,  microdefect  and  Integrity,  but  must 
also  meet  some  basic  requirements  of  material  properties 
that  are  unique  to  optical  data  storage,  such  as  aood 
reading  and  writing  capabilities,  and  an  accebtaoie  data 
rate.  Finally,  of  major  importance,  there  are  media  lifetime 
and  fabrication  cost. 

Glass  was  the  first  substrate  that  could  be  prepared 
with  a  quality  of  good  surface,  hloh  stability  and  low  MfR 
(hlt-error-ratej ,  However,  this  quality  was  overshadowei  ny 
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Current  Write-Once  Optical  Data  Storage  Product 


Its  cost#  bulk#  and  fragility#  Glass  Is  an  excellent 
candidate  for  archlvabillty# 

Polymers#  on  the  other  hand#  low  surface  quality  due 
to  the  molded  surface's  design#  Therefore#  a  careful  design 
and  process  control  are  needed  to  achieve  good  surface 
Quality#  The  ability  to  directly  pattern  the  surface  wltn 
positional  reference  data  as  wellto  orovlde  lo*.  cost  and 
high  stability,  present  significant  motivation  to  develon 
the  required  fabrication  technique  and  control.  The  mass 
production  is  already  underway# 

The  aluminum  dlsx  substrate  used  in  wlncnester  drives 
represents  an  Intermediate  cost  alternative  to  mass 
and  polymeric  substrates#  while  offer! no  excellent 
dimensional  and  cnemical  stability#  However,  a  spin 
coated  surface  layer  must  be  used  to  achieve  qoocl  surface 
duality#  Also,  the  format  and  positional  reference  data 
must  be  "burned  In"  after  the  disk  f abr Irntlon ,  wnlcn  is  a 
time  consuming  and  potentially  costly  procedure#  Aiuminui 
alloys  are  the  nest  candidates  for  hign  performance 
tnef#  201# 

Currently,  the  tellurium-based  alloy  arrears  to  offer 
the  best  combination  of  aforementioned  properties,  uhlie 
Including  sensitivity  adequate  to  meet  tne  necessary 
reaulrements.  Chen  et  ai#  (Ref,  21J  nave  reported  a  2b% 
Improvement  In  writing  sensitivity  for  tellurlum-oased 
(Te)  materials  relative  to  polymers.  Te-based  alloys  are  low 


meltlnq  naterlals,  that  are  easily  sublimable,  decomoosable 
or  vaporized  by  the  laser  heat.  Also,  the  Te  alloys  has 
adequate  archlvabllity  (over  10  years).  However,  due  to  the 
hiqh  cost  of  fabricated  Te-alloy  film  (sputterlna  is 
employed),  and  the  nature  of  Its  pronertles,  (l,e,,  it  must 
be  handed  very  carefully,  since  It  Is  poisonous),  miymers 
are  qolnq  to  be  the  leadino  candidates  for  optical  recordtnu 
media, 

G,  FFATURES  AND  RENFFITS  OF  fiPTlCAI.  RECORDING 

Fiaure  20  Illustrates  features  and  benefits  for 
optical  data  storaqe  recordlnq  [Ref,  S) ,  A  verv  imoortant 
feature  not  depicted  is  that  the  media  In  ootlcai  recordlno 
Is  encapsulated;  that  Is  to  say,  it  Is  protected  from 
contamination.  The  main  function  of  encapsulation  Is  to 
Keep  particulate  matter  away  from  the  plane  of  focus  at 
the  Information  storaoe  layer  In  order  to  minimize  Its 
effect  on  readlnq  quality,  particularly  ''ER,  a  second 
function  is  to  Shield  the  storaoe  layer  fro:T' 
potentially  corrosive  materials,  such  as  water  vapor  In  toe 
surrounding  of  the  disic,  A  third  function  is  to  Protect  it 
from  user  abuse,  whetner  the  abuse  is  intentional  or  not, 

H,  LlnlTATIONS 

The  total  data  storaqe  density  (the  areal  density)  is 
the  product  of  the  lineal  data  density  along  the  recorded 
track  and  the  track  density  In  the  radial  direction. 


Optical  recording  has  the  following  Inherent  limitations: 
lineal  density  «  300K  bits  per  Inch  fBPI),  track  density 
24K  tracks  per  inch  (TPX)«  and  hence  areal  density  of  8  * 
10  to  the  8th  track  revolutions  per  square  inch  (T8PT). 

the  lineal  density  Is  limited  oy  the  readout  or 
Playback  step  of  the  optical  recording,  since  the  finite 
resolution  of  the  reao  bean  results  in  rapidly  diminisnina 
playback  signal  amplitude.  In  order  to  accomplish  tne 
above  lineal  density,  the  demands  on  disk  flatness  an') 
focus  servo  performance  can  Indeed  be  very  challenqlnn,  Tne 
track  density  Is  limited  oy  the  finite  diameter  of  tr.e  read 
beam,  which  results  in  an  Increased  crosstalk  signal 
(external  noises)  from  the  adjacent  tracks  as  tne  track 
separation  Is  reduced  (kef.  221. 

1,  FUTURE  trends 

The  trends  In  the  optical  recording  are  three.  The  first 
trend  Is  to  improve  the  areal  density.  This  can  t>* 
achieved  by  the  development  of  an  enhanced  servo 
control  system.  This  is  essential,  since  a  high  decree 
Of  accuracy  Is  reoulred  between  tne  position  of  toe  nead  ano 
the  location  of  tne  data  on  the  media  surface. 

The  second  trend  is  to  improve  the  data  transfer  rate. 
This  can  be  accomplished  by  the  use  of  Integrated  arrays  of 
lasers.  The  multiple,  independently  modulated  output  beams 
Of  the  array  are  focused  within  the  field  of  view  of  a 
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slnqle  Objective  lent,  and  the  data  are  recorded  and 
retrieved  in  parallel  from  several  adjacent  trac<s  at 
the  same  tine.  The  use  of  tne  laser  array  should  permit 
an  extremely  hiqh  data  rate  (>  10  mb/s)  to  be  achieved 
without  placing  excessive  demands  either  on  dlsX  rotation 
velocity  (and  tnerefore  servo  performance)  or  on  the  outngt 
power  levels  of  each  Individual  laser  within  tne  array. 

The  third  trend  Is  to  increase  the  siqnal-to-nolse- ratio 
(SNR),  This  can  be  accomplished  by  Increaslno  tne  read  oea" 
power  proportionally  to  the  critical  threshold  power  that 
may  damaoe  the  tracK,  This  increase  In  th®  read  po-er 
results  each  time  that  the  data  rate  is  Increasen. 
Increasing  the  data  rates  requires  a  corresronding  increase 
In  the  read  beam  to  maintain  SNR,  Tnls  can  ow 
accomplished  by  utilization  of  more  suitable  storaoe  mediuT^ 
(Ref,  22), 

riqure  21  depicts  the  future  trends  for  the  optical 
recordlnq,  and  includes  hard  and  soft  maqnetlc  dlsif  storaop 
technology  for  comparisons.  For  each  technology,  there  are 
two  columns  of  data.  The  left  column  of  floures  represents 
the  maximum,  while  the  right  column  of  figures  Is  the 
minimum, 

J,  THF  SUMMARY 

The  prospects  of  the  optical  data  storage  have  continued 
to  strengthen  and  grow  during  the  past  few  years  ,  as 
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significant  developments  occurred  in  almost  every  facet  of 
this  tecnnology.  Optical  storaoe  has  received  a  tremendous 
Impetus  from  the  introduction  of  the  Sony  CO,  digital,  audio 
disk.  This  consumer  Item  not  only  has  created  a  broad 
acceptance  of  optical  disk  devices,  but,  being  a  hlon  volume 
product,  has  created  mass-productlon  components  appiicanie 
to  digital  storage  devices. 

The  optical  data  storage  continues  to  demonstrate  tne 
potential  to  become  an  Important  factor  In  tne  field  of 
hlgh-capaclty  on-line  storage  for  databases.  Kead-oniy  and 
wrlte-once  technologies,  wnlch  are  largely  complementary  to 
the  existing  conventional  storage,  are  already  emerging  into 
the  marketplace.  Tne  erasable  disk  technology  systems 
continue  to  demonstrate  progress,  and  may  one  day  oe  an 
alternative  to  conventional  magnetic  recording  devices. 
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Figure  21.  Future  Trends  of  Optical  Disks  vs.  Magnetic  Disks. 
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A,  AN  INTRODUCTION 

The  main  developmental  thrust  In  muitlPle-wrlte 
data  storage  (l.e.,  erasable  ootlcal  data  storage)  is  the 
magneto-optic  recording  (MDR),  as  its  name  Implies#  tr.e 
magneto-optic  recording  is  a  combination  of  conventional 
magnetic  and  optical  tecnnologies.  This  recording  nas  been 
around  for  over  fifteen  years;  but  due  to  unsuitanie 
media,  it  has  remained  dormant  until  recently.  ho'»ever, 
tne  introduction  of  newer,  rare-earth  transition  metal 
(RE-TM)  films,  which  possess  vertical  anisotropy  (ana  nence, 
the  magnetic  domains  are  normal  to  the  film  oiane),  nas 
placed  the  maoneto-optlc  recording  research  and  development 
Into  the  forefront. 

The  MPR  process  is  based  on  two  *ell-known  Physical 
Phenomena,  the  Curie  effect  and  the  Faraday  effect. 
The  Curie  effect  Involves  raising  the  magnetic  material  to 
a  specific  temperature,  where  the  material  is  most 
susceptible  to  magnetic  cnange  (l.e.,  it  is  demagnetized). 
The  Faraday  effect  is  the  change  of  rotation  of  ooiarized 
light  as  the  light  passes  through  a  magnetized  medium.  Tne 
light  can  rotate  left  or  right,  according  to  tne  direction 
of  magnetization.  In  effect,  when  the  light  is  reflected 
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from  a  magnetic  surface  Its  oolarlzatlon  Is  changed  to 
reflect  the  maonetlzatlon  of  the  surface, 

MOR  has  advanced  to  a  stage  where  it  has  created  a 
surge  of  enthusiasm.  This  chapter  Includes  its  principles 
of  operation.  Its  architecture,  and  its  techno] oaical 
implications,  which  include  media,  features,  benefits, 
limitations,  and  future  expectations, 

B,  BASIC  OPERATIONS 

The  recording  process  in  magneto-ootlc  films  reouires 
the  simultaneous  aopllcatlon  of  a  bias  (externally  applied) 
magnetic  field  tl  oe  directed  oppositely  to  the  initial 
film  magnetization,  together  with  a  localized  neat  ouise 
to  be  supplied  by  the  focussed  recording  laser  oean.. 
Figure  22  shows  schematically  how  mor  worlcs  (Ref,  23),  a 
beam  of  light  from  a  laser  is  focussed  onto  the  surface  ot 
the  perpendicularly  thin  film  causing  the  film  to 
increase  In  temperature,  to  the  Curie  point,  in  tne  area  of 
the  laser  beam,  Tne  localized  increase  In  ter-'oerature  causes 
a  localized  decrease  in  coerclvlty,  thereby  allowing  tne 
bias  field,  which  is  applied  antiparallel  to  the  original 
magnetization  direction,  to  reverse  the  direction  of  tne 
magnetization  in  the  heated  region,  nn  cooling,  tne  reverse 
magnetized  domain  persists.  Thus,  the  writing  occurs. 

The  same  magnetic  head  is  used  both  for  the  writing  and 
the  reading.  The  reading  is  accomplished  with  a  lower  nower 


_ FOCUSSED 

LASER  BEAM 


'  I  't/l  J  > 


A>  1  '//  ,  MAONETIC  ,  vAl  ;//  /  / 

•  i  1 1'i/i/rn-  magnetic 

-mirfurr 

;;ii!lli  ir-'.'  ■'““’"‘"•Ekili  ifii 


MAGNET  — 


(In)  A  locuosfd  lny»r  henm  toiscs  lha  locnl 
(riiittPfniuie  o(  lha  m««um  so  that  Ihn  nprfiM 
n,«,j(,o(ic  lloUl  (4  al'ic  10  <aillB  t  ipvotwd  domain. 
(Ih|  Ihc  domOMi  is  Ma5C<l  by  H'O  snmn  pior.or.s,  now 
Rklod  by  an  oppositely -oiiaclort  magncitc  llekJ. 


Figure  22.  The  Magneto-Optical  Recording  Basic  Operations. 


■'A'.' 


1:^1 


*  k. »  *% j, 


'•i.SSv'S 


-  .f 


laser  beam  utilizing  the  Kerr  effect.  The  Kerr  effect 
results  In  a  small  rotation  and  some  elllptlcltv  nelnn 
Introduced  Into  the  reflected  combonent  of  tne  read  team. 
Thus,  When  light  Is  directed  to  a  meqnetlzeo  surface,  the 
Dolarlty  of  the  light  beam  changes  sllohtlv,  upon  nelng 
reflected  nacK,  the  light  whose  oolarlty  has  changeo  doe  to 
the  effect  of  the  magnetization  of  the  maonetlzeo  area, 
rotates  sllohtly.  This  rotation  Is  nearly  undetected,  fro-^ 
,05  to  ,3  of  a  slnale  degree,  but  It  Is  enough  to  oe  read  tv 
an  optical  device. 

The  erasure  process  is  essentially  eguivalent  to  that  of 
the  writing  process,  except  that  the  direction  of  tne 
externally  applied  bias  field  is  reversed.  Due  to 
speed  limitations  on  switching  Cl,e,,  changing  direction 
Of  tne  current)  the  relatively  large  magnetic  field,  one 
revolution  Is  used  to  erase  the  sector  (l,e.,  set  tne 
magnetization  to  the  zero  direction)  and  a  second 
revolution  is  used  to  write  tne  ones  on  the  disx, 

C,  MATKHIAF.  SF.OUTREyENTS 

Amorphous,  rare-earth,  transltlon-metaliRE-i*')  thin 
films  are  the  most  widely  used  media  for  ''DR. 

storage  technigues  In  these  amorphous  Pb-Trt  films  have  tne 
advantages  of  hlgh-blt  density  and  contactless  write, 
read,  and  erasure  operations.  Hlgh-blt  density  Is 
accomplished  via  the  storage  of  data  in  a  seouence  of 


maqnetlc  domains^  while  writing  and  erasure  Is  performed  tv 
a  local  temperature  rise  In  conjunction  with  a  low  external 
magnetic  field.  Although  such  materials  are  erasable  and 
rewritable.  they  can  still  achieve  recording  densities 
comparable  to  those  of  wrlte-once  optical  dlsKs.  These 
materials  are  f err Imagnetlc  In  behavior,  that  is.  the 
magnetization  persists  even  when  the  applied  field  is 
reduced  to  zero  (i.e..  it  Possesses  a  spontaneous  nagnetic 
moment).  In  gadollnlum-cobait  (GdCo).  Gdfe.  and  terolum 
Iron  (TbFe).  for  Instance,  the  magnetic  moment  ot  the 
rare-earth  atoms  (Gd  or  Tb)  aligns  antlparaliei  to  tne 
maqnetlc  moment  of  the  transition  metal  (Co  or  Fe),  since 
the  temperature  dependence  of  the  two  rare-eartn  and 
transltlon-metal  magnetizations  are  different,  it  is 
Possible  to  produce  alloys  which  exhibit  a  temperature 
where  the  rare-earth  and  the  transltlon-metal 
magnetizations  are  equal  and  opposite  so  that  the  net 
magnetization  goes  to  zero  CFef.  24], 

These  RE-Th  materials  are  also  quite  staple  aaalnst 
applied  fields  and  at  moderate  temperatures.  They  nave 
high  coheslvlty.  and  they  can  be  used  for  archival  storage, 
although  more  testing  is  necessary  to  ensure  greater- 
than-f ive-year  archival  storage. 

There  are  various  combinations  of  RE-Tm  elements 
suitable  for  hOR.  Gadolinium-cobalt  films  were  studied 
early  on  for  this  application  but  Imamura  determined  that 


It  was  easier  to  stabilize  maonetlc  domains  In  GdTbFe 
(Gadolinium  Terbium  iron)  films  [251,  This  marked  tne 
beginning  of  the  use  of  ternary  alloys  in  order  to 
optimize  the  magnetic  and  magneto-optical  properties. 
GdTbFe  films  have  remained  the  most  popular  films  for  mjr. 
The  advantages  of  GdTbFe  films  are  that  they  are  amorphous, 
f errimaqnetlc,  have  oood  Kerr  rotation,  and  imderate 
Curie  temperature.  Thus,  RE-TM  films  nave  the  rertormance 
and  stability  required  for  mOR, 

There  are  also  other  classifications  of  films  which  are, 
although  suitable,  not  entirely  optimum  for  hPR, 
These  are  the  polycrystalline  films,  of  which  Cote  is 
the  most  common.  These  films  have  higher  conesivltv 
than  the  re-tm  films  as  well  as  better  snr  05  decioeis  - 
a  decibel,  d8,  is  a  unit  for  measuring  the  relative  loudness 
of  sounds,  from  a  range  of  1  to  130,  for  CoFe  as  compared  to 
45  dP  for  GdTbFe  -  The  optical  media  storaoe  guidelines  is 
45  dB),  This  results  In  lower  BPR,  Rut  these  films  nave  many 
problems  of  which  stability  Is  the  major,  oue  to  tneir 
Intrinsic  properties,  these  media  are  relatively 
unstable, 

D.  FEATURES  AND  BENEFITS  OF  MOR 

The  MOP  technology  is  an  attempt  to  abstract  from  tne 
conventional  recording  the  experience  and  <now-now  and 
from  the  optical  recording  the  high-density  capacity  and  low 


costt  The  following  are  some  of  the  major  features  and 
oeneflts  of  mor: 

(1)  as  media  RE-TM  alloy  are  the  leading  candidates, 

(2)  multilayer  interference  coatings  are  Important 
to  achieving  adequate  SNR, 

(3)  the  cyclablllty  Is  good,  greater  than  10  to  tne 
7tn  cycle  reoorted, 

(4)  the  data  retention  Is  good, 

(■5)  tne  drive  tecnnology  is  In  place  today, 

(6)  an  overwrite  requires  a  sequential  erasure, 
followed  by  a  re-wrlte, 

(7)  tne  stability  of  the  RE-TM  films  is  adequate, 

(R)  it  nas  the  similar  oerformance  of  the  optical 
recording  and  the  erasablllty  of  the 
conventional  recording, 

(9)  It  has  removable/portable  caoabllltv,  and 

(10)  the  nondestructive  readout  Is  achieved  ov  the  Kerr  effect. 
Figures  23  Illustrates  three  of  the  major  companies 

who  are  developing  MUR  products,  along  with  its  salient 
cnaractertstlcs.  Although  IBM  Is  not  Included  In  Figure  23, 

It  has  recently  joined  tne  guest  in  the  development  of 
maaneto-optlc  disKs,  The  leading  contenders  are  currently 
Sharp/Verbatlm,  3m,  and  Matsushita  iRef,  20), 

K,  LIMITATIONS 

EwperlmePts  nave  shown  that  the  performance  of  the 
magneto-optlc  recording  media  Is  currently  limited  bv 
system  parameters,  such  as  laser  wavelength*  and  tne 
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Figure  23.  Perforaance  Characteristics  of  Magneto-Optic  Media 
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numerical  aperature  (size  of  the  opening)  of  the  focus 
lens*  Since  the  size  of  the  pits  on  the  meila  is 
determined  by  the  division  of  the  wavelenotn  ov  the 
numerical  aperature  and  the  multiplication  of  a  constant 
(.56)*  the  best  that  can  be  achieved  is  a  wavelennth  of  )i’20 
nanometers*  and  a  numerical  aperature  of  micrometers  to  t 
micrometers.  Hence*  ootlmum  values  are  not  utlilzea  tPef. 
23] . 

Moreover*  since  the  size  of  the  pits*  «ihich  is  nre 
primary  factor  that  determines  the  olt  density  alono  tne 
track*  Is  limited,  the  density  capacity*  ahicn  is 
eguivalent  to  wrlte-once  optical  density  capacity  is  also 
limited.  The  other  orlmary  factor  that  determines  tne 
bit  density  Is  the  number  of  bits  per  recorded  nit,  re 
total  disk  capacity  of  an  optical  disk  depends  on  the  numoer 
Of  bits  that  can  be  stored  on  a  single  revolution  (track)  of 
the  disk  and  the  total  number  of  revolutions  (tracks),  the 
malor  factors  that  determine  the  total  number  of  tracks  ar* 
the  size  of  the  disk  and  the  track-to»track  spaclno*  wnich 
must  be  sufficient  to  reduce  crosstalk  oetween  the  trac«s 
to  an  accept  able  level  *  since  SNP  does  limit  large  nit. 
length. 


ill 


F.  FUTURE  TRENDS  AND  POTENTIAL  PROBLEMS 


The  future  trend  of  the  maqneto-ontlc  recording  Is  to 
Increase  the  areal  density,  which  Is  similar  to  wrlte-once 
and  read-only  technologies,  and  enhance  Snr,  Rotncniiri, 
president  of  the  Rothchlld  Consultants  Research  Flrnr.,  claims 
that  SNR  can  be  easily  enhanced  by  utlllzino  error 

detection  procedures  similar  to  formattlno  a  hard  dist<. 
Hence,  high  error  rates,  due  to  loe  S^R  can  he  eilmtnated, 
west  of  the  technical  difficulties  encountered  wir.n  the 
MOR  technology  reside  at  the  media  level,  A  process  used 
to  create  loe  volumes  of  disKS  In  tnc  laboratory  under 
ideal  circumstances  Is  not  easily  adarteo  for  mass 
Production  of  thousands  of  disks  oer  hour,  Prohablv,  the 
two  major  difficulties  In  mass  nroductlon  is  controlllnn  tne 
thickness  of  the  recording  layer  and  guaranteer inn  tne 
integrity  of  the  written  data  for  a  minimum  of  lO  years. 


Therefore , 

a  better 

amorphous  substrate 

material 

ir 

required. 

Until  the 

introduction  of 

a  better  neilum. 

t  h  f* 

only  way  of 

ensuring 

data  intearltv 

that 

once  data 

IS 

recorded  is  to  protect  the  layer  from  oxidizing,  ‘oreover, 
a  question  to  oonder  is  the  Importance  of  a  oreater  than  in 
year  arcnivablllty  CRef,  26J, 
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Flaure  24  depicts  the  mor  status.  Flour®  24a  illustrates 
aopiicatlons  for  this  techrolooy  In  comparisons  with  other 
technolooles, 

wagneto-optlc  drives  will  be  used  in  a  laroe 
variety  of  applications,  from  memories  for  small 
portable  computers  to  mainframes  with  large  on-ilnp 
databases.  Some  possible  uses  Include: 

(1)  an  Inexoenslve  replacement  for  mainframe 
oerlpherals  (1  to  2  Gdyte), 

(2)  an  archival  bacK*up  tor  those  peripherals, 

(3)  a  replacement  for  small  wincnester 
drives  (.OS  to  .3  Gbyte), 

(4)  an  archival  back-up  for  winchester  or  for 
maoneto-ootlc  drives,  and 

(5)  an  on-line  mass  storage  system  that  combines 
removabllltv  with  random-access  anu 
terabyte  caoaclty, 

G,  SUMMARY 

Magnetic-optical  memory  products  are  expected  to  be 
Introduced  into  the  marketplace  In  198o,  <^men  this 
does  happen,  the  mop  technology  win  have  an  enormous 
Impact  on  the  deslon  of  future  digital  data  storag- 
systems,  especially  personal  computers, 

Magneto-opt Ic  drives  win  combine  large  storage  density, 
low  cost,  random  access  and  erasabllity  with  removanl J itv. 
Removability  is  particularly  Important  because  it  win 
enable  a  drive  to  be  used  with  anv  medium,  whether  it  oe 
erasable,  non-erasable  or  read-only.  It  is  very  likely  that 


riJin 


MOR  drives  will  be  multifunctional,  thereby  allowing 
software  to  be  sold  on  read-only  dlst«s,  bactc-up  to  c<* 
performed  on  either  erasable  or  non-erasable  dis<s,  and 
erasable  disks  to  oe  used  on-line  for  system  functions. 
Inexpensive  mqr  drives  are  now  practical  because  of 
advances  in  amorphous  RK-Tm  films. 


VII.  ZUC  aZUEK  ESCQ&CIEC  ZECtiiiQLOSlES 


Two  other  recording  technologies  which  are  not  as 
prevalent  as  conventional  and  ootlcal  recording 
technologies  are  the  SAM  (random-access  memory)  and  tr.e 
Bernoulli  cartridge.  These  technologies,  esteclally  rio 
not  nave  the  hlgh-capaclty  potential  and  low  cost  of  tne 
conventional  and  optical  recording.  However,  they  do  r.ave 
other  superior  characteristics.  This  chapter  introduces 
these  two  technologies  and  their  operation,  cnaracter 1st ics 
and  useds, 

A,  BAB 

In  early  computer  systems,  memory  technology  was  very 
limited  In  speed  and  hloh  in  cost.  Since  the  I970's,  rne 
advent  of  hlgh-soeed  random-access  memory  (KAm)  cnlos  has 
significantly  reduced  the  cost  of  computer  main  nerrorv 
by  more  than  two  orders  of  magnitude.  Chips  no  larger 
than  1/4  Inch  square  contain  all  of  the  essential 
electronics  to  store  hundreds  of  thousands  of  pits  of  data 
or  Instructions, 

Although  the  RAB  acronym  indicates  the  random-access 
capability,  it  Is  actually  a  misnomer,  since  almost 
all  semiconductor  memories  except  for  a  few  special  tyoes 
can  be  randomly  accessed,  A  more  appropriate  name  for 


this  wemory  would  be  a  read/write  RAM  to  Indicate  that 
data  can  be  written  into  the  meniory  as  well  as  be  read 
out  of  it  randomly. 

There  are  two  basic  tyoes  of  RAms»  static  and 
dynamic,  fhe  differences  are  significant.  The  rax  tyre 
refers  to  the  structure  of  tne  actual  storaoe  circuit  used 
to  hold  each  data  bit  within  the  memory  chin,  A  dynamic 
memory  uses  a  storaoe  cell  based  on  a  transistor  ani 
capacitor  combination,  in  »hlch  tne  data  is  reor esnnte"' 
hv  a  charge  stored  on  each  of  the  capacitors  in  tne 
memory  array.  The  memory  gets  the  name  dynamic  trnm 
the  fact  that  the  capacitors  are  imperfect  and  *tli  iose 
their  charge  unless  the  charoe  is  repeatedly  rerlentsneJ 
(refreshed)  on  a  regular  basis  (usually  every  2  ms),  if 
refreshed,  the  data  will  remain  until  intentionally  change'' 
or  tne  power  to  the  memory  is  shut  off.  They  regulre 
siipoiementary  circuits  to  do  the  refreshing  and  to  ass'ire 
tnat  conflicts  do  not  occur  between  refreshing  and  normal 
read/write  operations.  Although  they  do  have  to  contend  ■•Itn 
these  extra  supplementary  circuits,  dynamic  nA’^s  stf.ll 
require  fewer  on«cMp  components  rer  bit  than  do  static 
WAns,  which  do  not  require  refreshing.  Since  dynamic  ki"s 
do  require  fewer  components,  it  is  oossibie  for  tnen  tn 
achieve  higher  densities  tnen  static  kams,  Tnese  nigher 
densities  also  lead  to  lower  costs  per  bit.  Static  HA**s,  in 
contrast,  do  not  use  a  charge-storage  technique;  instead. 
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they  use  either  four  or  six  transistors  to  form  a  fiio-fion 
for  each  storaqe  cell.  Once  the  data  Is  loaded  into  the 
fllp-flop  storage  elements.  It  will  Indefinitely  hold  this 
data  until  It  is  purposely  changed  or  the  power  is  shut 
off.  Static  RAMS  are  easier  to  design,  Tnev  compete  well  in 
applications  where  the  memory  requirement  is  not  too  qmet, 
since  the  cost  of  the  smaller  memory  is  not  overwhelming. 

There  is  another  trade-off  to  be  made  with  rdndor..-acces«! 
memories.  In  addition  to  the  choice  of  dynamic  vs,  statir 
types,  there  is  the  choice  of  Mns  cretal  oxide 
semiconductor),  and  olroiar  Chios.  Bipolar  devices 

are  faster,  and  provide  better  perfornarce,  hut  f'ave 
not  yet  achieved  the  hlqher  densities  and  nerce  tne  lower 
cost  of  MOS,  as  well  as  its  lower  power  consumption. 

In  order  for  the  RAM  technoloqy  to  be  viable  as  an 
alternative  to  on-line  hloh  capacity  media  storane,  this 
technoloqy  must  have  the  capability  of  nloh  capacity.  In 
terms  of  capacity,  since  the  early  1970s,  when  a 
density  of  IK  (1024  bits)  per  chip  were  Introduced, 
Improvements  in  semiconductor  processlno  and  circuits 
deslon  nave  made  practical  an  increase  in  density.  This 
Increase  went  to  4K  bits  on  a  chip  to  IbK  bits,  and  in  I98ii 
to  64K  bits.  Limited  proauction  of  dynamic  RAds  25bK  i>lts 
began  In  1903, 


Samples  of  dynamic  and  static  RAM  devices  ano  prices  are 
as  follows  [Ref ,  27] : 

(1)  Fairchild  has  Introduced  a  45  ns  MClS  64K  x  l«blt 
static  PAM  for  $  90,00; 

(2)  Integrated  Device  Technology  states  that  its 
64K  X  4-blt  static  HAM  will  deliver  access  times 

on  the  order  of  45  ns  and  cost  less  than  s  looo.oo; 

(31  Hitachi  states  that  Its  25  ns  ”ns  x  l-oit 

static  PA«*  is  the  fastest  avallaMe  and  wiH 
cost  S  6P,50  (In  oulMs  of  10,000); 

(4)  Electronic  Designs'  ihK  x  4»blt  mds  static  »A'f 
has  an  access  time  of  55  ns  and  costs 
$245.00  (In  huixs  of  lOO); 

(5)  loshloa  Intends  to  marxet  Its  45  ns, 

A4K  X  1-blt  mos  static  HAM  in  Nov  1085 
for  s  30,00  each; 

(6)  Toshiba  and  Vltellc  are  both  Introducing 

1  M  X  1-blt  MOS  dynamic  RAM  at  the  end  of 
this  year.  These  devices  employ  geometries 
of  less  than  2  microns  to  attain  access  times 
less  than  100  ns, 

B.  PAM  CHAHACTFHISTICS 

Some  of  the  characteristics  of  MOS-hased  ham  are; 

(1)  Consumes  little  power; 

(2)  Does  not  usually  require  back-up; 

(3)  Very  fast,  with  access  speeds  below  loo  ns; 

(4)  Ideal  for  systems  that  write  a  lot,  but  stores 
little; 

(5)  Very  expensive; 

(6)  Areal  density  Is  10  to  the  4th  hits  oer  square  men 
for  54K  bit  PAH,  and 

(7)  256K  bit  density,  with  IM  bit  density  In  the  wings. 


A  new  type  of  PAH  that  Is  gradually  making  Its  mark 


on 
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the  mar»cet  is  the  nonvolatile  RAM,  Nonvolatile  memories  ere 
a  most  Interestinq  and  active  segment  of  memory 
technology.  These  devices  retain  their  contents  even  virhen 
the  system  loses  power.  This  type  of  nonvolatile  RA”  is 
actually  nothing  more  than  the  combination  of  the 
flexibility  of  the  ram  with  the  permanence  of  the  RHm  (rea-* 
only  memory),  when  power  is  removed.  The  result  Is  that  for 
every  stored  bit  there  are  two  memory  cells,  one  of  wrico 
is  volatile  and  the  other  nonvolatile,  Durlno  normal  system 
operation,  the  nonvolatile  RAM  uses  the  volatile  Tiemorv 
array,  but  when  It  receives  a  special  store  siunai,  oata 
held  In  the  RAM  area  Is  transferred  Into  the  nonvolatile 
section.  Thus,  the  RAM  section  provides  unilnlted  read 
and  write  operations,  while  the  nonvolatile  section 
provides  back-up  when  power  Is  removed.  The  drawoacks  to 
this  almost  Ideal  memory  element  are  twofold.  First,  It 
wears  out.  That  Is,  the  electrical  process  used  to. store 
data  in  the  nonvolatile  array  causes  a  steady  deterioration 
In  the  ability  of  the  memory  to  retain  data  for  a 
guaranteed  period  of  time.  Currently,  avallaoie 
capabilities  range  from  about  10,000  to  over  l,ooo,0uo  write 
cycles,  but  many  times  that  numoer  are  needed  for 
general  purpose  use.  Second,  nonvolatile  RAes  have  only 
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reached  4K  bits,  with  smaller  amounts  already  flndlno  their 
way  onto  slngle-chlp  microcomputers  CRef,  29]. 

c,  shmmary  of  ham  technology 

The  possibilities  of  RAM  technology  of  reolaclna 
solnnlnq  media  is  remote,  although  the  microprocessor  chir 
technology  continues  to  improve.  Production  of  25fiN  cnlps 
is  revving  up  earlier  this  year,  and  the  5i2K  chlo  is 
already  a  step-child  of  the  much  heralded  megaolt  cnlp, 
Which  Is  being  Introduced  now. 

Right  now,  although  large  semiconductor  electronic 
memories  are  available,  the  cost  is  oronlbltive,  Intel's 
FAST  3825,  a  12  MB  to  144  MB  RAH  dlstc  system,  which  IS  made 
up  of  64K  chips.  Is  priced  in  the  s  I00,ooo,uo  range  [Ret. 


29].  In  the  foreseeable  future  it  seems  that  electronic 
memories  are  not  close  to  the  cost  oer  megapyte  offered 
by  the  spinning  tecnnologies.  For  example,  utilizing  the 
highest  density  available  today  (256K),to  put  together  a 
IP  we  memory  with  25bK  rams.  It  will  eventually  get  oown  to 


D,  BERNOULLI  CARTRIDGE 

m  1981,  three  IBM  employees  left  Big  Blue  to  start 
their  own  comoany  In  Utah,  the  Iomega  Corporation,  to 
manufacture  what  they  believed  to  be  the  Ideal  nass 
storaoe  system  disk,  the  Bernoulli  disk,  Tne  fernoulii 
disk  has  the  best  attributes  of  floppy  disks  and  oaru  disks 
without  their  shortcomings.  Floppy  disks,  for  example, 
trade  low  storage  density  and  long  access  times  tor 
portability,  ease  of  backup,  and  low  cost,  hard  diss.s,  on 
the  other  hand,  trade  sensitivity  to  dirt  and  shock  for  an 
Increase  In  storage  capanllity  and  speed. 

The  Iomega  Corporation's  Bernoulli  disk,  tne  Ainna  1o, 
Is  an  eidht  Inch,  cartridge-loaded  flopny  disk  tnat  nolds  lo 
megabytes.  The  magnetic  medium  Is  only  a  three  mil  tnick, 
mylar  floppy  disk,  unlike  a  normal  floppy,  tne  Aipna  in 
disk  Is  housed  In  a  magazine-sized  elastic  cartrlane. 
The  cartridge,  ll<e  a  Video  cassette,  automatically  closes 
Up  when  removed  from  a  drive,  which  protects  the  disk  tro'^ 
contamination,  wnen  the  cartridge  Is  Inserted  into  the 
drive,  the  disk  is  exoosed  to  a  flat  plate  over  wnlcn  it 
will  fly  (around  the)  soindle  and  move  close  to  tne 
read/write  head.  The  drive  Is  given  stability  and  tne  close 
head-to-dlsk  clearance  crucial  to  Moh  storage  densltv 
by  taking  advantage  of  the  "Bernoulli  Principle". 


E,  BERNOULLI  EFFECT 

Daniel  Bernoulli,  a  Swiss  mathematician,  observed  more 
than  200  years  aqo  that  the  pressure  of  a  movlnci  fluid 
Is  Inalrectly  proportional  to  its  speed,  if  a  ilsic  solos 
close  to  a  stationary  surface,  a  negative  oressure  Is 
generated  between  the  two  and  the  stabilizing  effects 
cause  the  disk  to  fly  at  a  determined  distance  aonve  tho 
rigid  Bernoulli  plate.  Another  atnllcatloo  of  tnis 
principle  Is  In  the  head  deslan.  The  read/ write  head 
In  the  Bernoulli  disk  Is  stationary  and  protrudes  throunn  a 
banana-shaped  slot  In  the  Bernoulli  oiate,  roe  nea'^ 
mounting  bracket  Is  shaped  so  It  protrudes  a 
thousandths  of  an  Inch  above  the  plate.  These  "bumps"  jr. 
the  Plate  cause  the  secondary  area  of  the  Bernoulli 
effect ,  drawing  tne  disk  even  closer  to  the  head,  wnicn 
has  a  4  to  7  mlcrolnches  of  clearance. 

The  advantages  of  this  scheme  are  obvious.  Because 
the  disk  flies,  rather  than  the  head,  disturbance  of  tne 
device  causes  the  disk  to  lose  lift  ano  fall  away  from  tne 
head,  rather  than  toward  It,  Hence,  tne  head  can  not  crash 
with  the  Bernoulli  disk. 

Figure  25  Illustrates  the  Bernoulli  pumping  effect  Iwef, 
30],  The  Bernoulli  technloue  takes  advantage  of  the 
rapidly  flowing  air  (l,e.,  the  air  next  to  the  surface  of  a 
rotating  disk),  high  disk  rotation  speeds,  and  megabyte 
data  storage  of  a  floppy  disk.  The  rapid  rotation  of  the 


disK  generates  an  airflow  that  pulls  the  dlsic  surface  towar  i 
the  drive  read/write  head.  The  shape  of  the  irlve 
head,  however,  is  engineered  to  prevent  the  disif  surface 
from  actually  touching  the  head.  As  the  disK  surface 
approaches  the  head,  an  air  bearing  of  less  than  ten 
microlnches  forms,  holding  the  disk  away  from  trie  head. 

Figure  25a  depicts  the  three  types  of  products  alon'i 
with  tnelr  characteristics,  a  single,  10  kbyte  drive  costs  s 
2,695.00.  A  dual  drive,  20  Mbytes  costs  s  3,b95.oo.  unen 
two  Aloha  10  drives  are  Installed  with  a  cower  suepiy  In  a 
box,  the  result  is  the  Bernoulli  Pox,  Figure 
Illustrates  the  features  and  benefits  of  the  Bernoulli  .Us't 
drive, 

F.  SUMMARY 

For  small  database  systems,  the  Bernoulli  disk  drive 
offers  a  very  viable  alternative.  The  Iomega  innovations 
nave  created  a  system  with  a  24,000  blts-rer-lnch  density, 
a  300  track-ner-lnch  track  density,  a  data  transfer  rate  of 
1.13  meaabvtes  oer  second,  a  system  latency  of  k'"* 
milliseconds,  and  an  average  access  time  of  b" 
milliseconds.  The  storage  of  the  Bernoulli  disk  Is  as/o 
reliable,  quiet,  and  quick  as  any  winchester  glsk  currently 
available,  and  much  cheaper.  Better  yet,  the  Bernoulli  disk 
provides  a  credible  backup  facility  for  a  >«incnester 
disk,  one  that  doesn't  require  the  Incessant  changing  of 
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FEATURES 


BENEFITS 


Aerodyn&mic  Media  Stabilization 

Unique  Equalization  Circuits 
Design  Simplicity 
Flexible  Media 

No  Purge  Cycle 

On-Board  LSI  Controller 

SCSI  Interface 


Highest  Performance  Reliability 
and  Areal  Density  of  any 
Removable  Disk  Drive 

100  %  Interchangeability 


Lowest  Cost  10  MByte  Cartridge 
More  Resistant  to  shock  and 
Highly  Resistant  to  Contamination 
Fastest  Stop/Start  of  any 
High  Performance  Drive 

Only  Disk. Subsystem  to 
Conform  to  Disk  Standard 
(Size  and  Mounting) 

Compatible  with  SCSI  H/ W 
and  Protocol 


Figure  25b.  Features  and  Benefits  of  Bernoulli  Disk  Drives 
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many  floppy  disks.  They  are  also  tough, 

representatives  nave  thrown  Bernoulli  cartridges, 

cost  $80.00  each,  around  like  frlsbees  to  demonstrate 
point.  This  disk  Is  hard  to  beat  when  looking  for  a 
mass  storage  system  or  for  additional  storage  witn  a 


omega 
Ah  icn 
this 
small 
good 


backup  facility 


V I I I ,  xecuuaLQCx  cqxca&i&xxs 


The  conventional  «aqnetic  recordlno  is  In  a  state  of 
renaissance  at  this  time,  with  gains  in  density  foreseeanie 
for  at  least  another  decade.  For  the  last  35  years,  tr-e 
data  storaoe  technology  has  been  doninatea  ny  tne 

conventional  magnetic  recording,  and  the  rate  of  oroqress  In 
the  areal  density,  tne  k.ey  measure  of  merit,  n.is 
continued  undlmln tsned ,  douDllno  about  every  3o  montrts  fcr 
the  last  30  years.  The  parameter  which  has  nade  this 
possible  has  been  the  head-to-medluir.  spacing,  wnicn  has 
been  reduced  over  the  years  from  25  microns  to  ,3  microns, 
with  current  laooratory  investlgat Ions  now  at 

microns.  Also  important  has  been  the  oreclslon  mechanical 
employment,  such  as  closed-loop  servo  systems,  as  well  as 
improvements  of  media  utilized. 

This  continuing  resurgence  of  the  conventional 
magnetic  recording  places  Increasing  pressure  on  tne 
optical  technology.  The  pace  of  ootlcai  storage  devices 
entering  tne  maricetplace  has  been  rapiiiv 

Increasing,  A  significant  development  was  tne  introduction 
of  DAD,  and  CD-ROM,  These  products  will  make  the  itarket 
acceptance  of  ootlcai  read-only,  wrlte-once,  and 
erasable  storage  devices,  easier,  Aithougn  storage 
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Figure  26.  Density  and  Data-Rate  Comparison 


MAGNETIC  DISK  OPTICAL  DISK 


Direct  Access 
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Figure  26a.  Access,  Capacity  and  Seek-Time  Comparison. 


MAGNETIC  DISK  OPTICAL  DISK 
Removability  No  Yes 

Cost  Medium  Low 


Figure  26b.  Removability  and  Cost  of  Media. 


MAGNETIC  DISK  OPTICAL  DISK 


Shelf  Life  >  10  Yrs  >  10  Yrs 

Encapsulation  No  Yes 

Figure  26c.  Studiness  and  Archivability. 
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Figure  26d.  Head-Disk  Gap  and  Track  Servo. 
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advances  In  the  storaoe 
Information  have  radically  transf 
wnich  decisions  are  made  today, 
simultaneously  expanded  the  variety 
nation  Involved  In  oeclslon  na'cina 
the  pace  at  which  decisions  nave 


,  the  phenomenal  growth  In  technol 
and  distribute  Information  has  not 
mmensurate  growth  in  technologies 
and  analyze  huge  volumes  of  Informa 
Imbalance  has  been  created  between 
media  and  the  technologies  needed  to 
the  stored  Information.  The  result  has 
exploit  the  full  value  of  inforti 


In  this  aoe  of  Increased  attention  to  the  problem 
information  processing  and  utilization,  one  seexs  to 
formatted  databases  advances  In  techniques  for 
anstractions  models,  structures,  accesses,  retrle 
compressions  and  models,  as  well  as  differential  files. 
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Aa  ahstZAetiaa  hides  the  details  of  a  set  of  alanrithms 
and  data  and  allows  aeneral  and  common  properties  of  the  set 
of  aloorlthms  and  data  to  reveal.  Thus,  the  aostraction  is 
one  of  the  main  ways  of  structuring  and  visualizing  vast 
amount  of  data  and  very  complex  algorithms.  It  is  used  to 
Obtain  categories  of  aloorlthms  and  data  and  to  combine 
categories  Into  more  general  categories.  It  has  been  used 
extensively  In  computer  science  to  reduce  complexity  and  aii 
understanding  of  algorithms  and  data. 

An  elenentary  form  of  abstraction  distinguishes  between 
the  taicao  Zaael  and  the  titaa  isiiaX.  A  laitAD  is  an  actual 
value  or  a  particular  instance  of  an  ooject. 
Abstraction  Is  used  to  define  a  Zvca  from  a  cias.5  of 
similar  tokens. 

In  terms  of  database  objects,  the  abstraction  is  used  in 
two  ways:  generalization  and  aggregation  [Ref,  31 J, 

In  generalization,  a  set  of  similar  tokens  or  a  set  of 
like  types  Is  viewed  as  one  generic  tyre.  The  token-tvpe 
generalization  Is  usually  differentiated  from  the  tyne-tyoe 
generalization.  The  former  process  is  referred  to  as 
"classification”,  while  the  latter  process  is  called 
"generalization”.  For  instance,  viewing  a  set  of 


-is  ^ 


Individual  eiroloyees  as  one  qenerlc  type#  emnioyee.  Is 
considered  classification,  while  viewino  the  tyoes  of 
employees  and  students  as  one  generic  type,  person.  Is 
considered  generalization. 

The  classification  of  tokens  enhances  understanding  oy 
allowing  Individual  toKens  to  be  grouped  Into  types.  Tyoes 
can  be  further  generalized  Into  other,  more  general  types, 
3y  using  classification  and  generalization,  the  emphasis  is 
placed  on  the  similarities  of  objects  and  types  wniie 
abstracting  away  their  differences  and  details.  F'igure  27 
Illustrates  this  distinction. 

An  aogregatlon  is  tne  abstraction  by  which  an  object  is 
characterized  by  its  constituent  objects.  For  instance,  a 
person  can  be  characterized  by  his  name,  address,  and  age. 
The  aqgreoation  can  be  use  either  at  the  toKen  level  ot  at 
the  type  level.  For  Instance,  the  type  employee  car.  oe 
characterized  by  the  types:  name,  aoe,  and  address.  An 
aggreoatlon  at  the  type  level  portrays  a  set  of 
aggregations  at  tne  token  level  of  the  constituent  types. 
Figure  2R  Illustrates  this  point. 

Abstractions  have  been  used  informally  In  oata 
manaoement  for  a  long  time,  i*hile  the  aggregation  is  used 
during  the  file  design  to  group  fields  of  different  data 
types  in  a  common  file,  tne  generalization  is  used  bv 
Introduclno  tne  notion  of  a  file  as  a  generic  record  type 
representing  the  properties  of  many  records,  Moreover,  tne 


record  type  has  semantics  and  is  no  longer  an  unl nteroreted 


set  of  records. 


The  abstraction  can  be  used  both  »*lth  the  hottom-ur 
approach  or  the  top-down  approach.  Using  tne  oottoT-un 
approach,  an  abstraction  can  be  viewed  as  a  synthesis  of 
simple  objects  that  enables  one  to  understand  a  co^npiex 
object.  Starting  with  observed  data,  l,e,,  the  tokens,  to 
Which  one  applies  the  classification  to  produce  tyres,  then 
the  oenerallzatlon  and  aaoregatlon  can  ne  used  to  group  and 
structure  types  into  new  generic  and  aggregate  tyoes. 

Alternately,  the  top-down  approach  may  be  used  to 
decompose  complex  types.  Starting  with  a  comoiex  tvpe.  It 
can  be  decomposed  into  Its  components,  tnrough 
specialization,  whlcn  is  the  opposite  process  to 
generalization,  and  instantiation,  which  is  the  ooposlte 
Process  to  aggregation,  to  the  toicen  level.  Typically,  th«» 
bottom-up  approach  Is  used  to  understand  a  complex 
Phenomenon  and  the  top-down  approach  is  used  to  desiun  a 
comoiex  Object,  Both  methoos  can  also  be  used  together. 

These  two  abstraction  techniques  are  generally  present 


iv.y. 
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in  most  data  models.  Some  data  models  first  define  tne 
toKens  of  Information  and  then  give  structuring  principles 
to  combine  and  categorize  them,  while  other  data  models 
enable  the  user  to  specify  complex  types  which  are 
associated  wltn  constituent  types  and  eventually  with  toKens 
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Generalization  is  the  Abstraction  Technique  by 
which  a  Group  of  Objects  are  Generic  ally 
Classified 


Example  : 

employee  =  generic  [El,  E2,  E3  ] 

El  =  employee  #  1 
E2  =  employee  #  2 
E3  =  employee  §  3 


Figure  27.  Generalization 


Aggregation  is  the  Abstraction  Technique  by 
which  an  Object  is  Constructed  from  its 
Constituent  Objects 


EXAMPLE  :  r  r  r.  .  ^  ^  l 

employee  =  aggregate  [  Nm,  Ej^,  Ag,  Ad  j 


Nm  =  name 
E#  =  employee  no. 
Ag  =  Age 
Ad  =  address 


Figure  28 .  Aggregation 


of  information.  Abstractions  are  used  to  give  meaning  to 
sets  of  Objects,  ^netner  they  are  toicens  or  types. 

A.  CUKRE.'IT  ABSTRACTION  APPROACHES  FOR  STATISTICaI-  ABSTRACTS 
Tbe  current  approaches  for  statistical  aostracts  are 
(aggregation)  or  aatliaftBiiaa  (generalization). 
Figure  2R  Illustrates  the  difference  between  these  f.wn 
approaches.  Samollng  Is  the  selection  of  const  ituenf. 
Objects  from  the  whole  for  an  analysis  and  estimation  of  too 
nature  of  the  whole,  Antlsampllng  Is  defined  as  tne 
selection  of  a  generic  super  set  of  the  set  In  question  in 
order  to  analyze  the  nature  of  the  set. 

In  evaluating  the  relative  merits  of  sampling  ana 
antisampling,  there  are  numerous  serious  disadvantages 
to  attemijtlna  to  estimate  statistics  on  a  pooulatlon  tv 
statistics  on  a  random  samole  of  that  ootulatlon,  Inere 
are  six  dlsad.vantages  (Ref,  32], 

Firstly,  sometimes,  the  oata  have  been  aggregated  In 
means,  counts,  and  so  on,  as  there  nave  oeen  large  anounrs 
of  data  from  instrument  readlnas  In  laboratory 

experiments.  For  example,  much  of  the  oubilshed  u.s.  Census 
data  are  in  statistical  forms  to  provide  privacy  protection 
for  an  individual's  data  values.  Sampling  aggregated  data 
can  be  very  trlcKy,  and  may  not  be  possible  without  detailed 
information  about  the  data  before  they  are  aggregated. 


Seeondlyr  sampling  Is  Inefficient  In  a  paglnn 

environment.  Assuming  that  the  sample  Items  are  randomly 


distributed  across  cages.  In  exactly  the  same  way  tnat  a 
formula  Is  used  for  any  set  randomly  dlstrlouted  across 
pages,  experiments  (Ref,  331  have  shown  that  samplinq  is 
going  to  be  approximately  the  number  of  paaes  retrlevp-* 
times  less  page  efficient  than  a  full  retrieval  of  the 
entire  database, 

Thiraly,  random  sampling  is  also  Inefficient  even 
Indices  are  used.  Depending  on  how  the  Index  is  stored, 
may  reaulre  more  temporary  storage  space  for  the  pointers 
all  the  items  In  the  set,  and  many  Index  pane  accesses. 

Fourthly,  sampling  Is  a  poor  way  to  estimate  extre*'ii'" 
statistics  such  as  maximum,  mode  freguencv,  and  oounos 
on  distributional  fits,  Cxtrema  nave  Imoortsr.t 
applications  In  Identifying  exceptional  or  oroolematlc 
oehavlor.  Similarly,  It  Is  very  poor  for  obtaining  absolute 
bounds  on  statistics,  which  are  Important  for  many 
coiT.Duter  algorithms  based  on  those  statistics. 

Fifthly,  sampling  Is  restricted  to  the  nature  of  toe 
sample  itself.  Given  a  sample.  It  is  hard  to  spec\ilate 
about  properties  of  a  subset,  superset,  or  slollng  of  that 
set. 

Lastly,  a  sample  does  not  have  semantics.  It  is  of 
Interest  only  as  a  sample  and  not  as  a  set  created  by  set 
Intersections  might  be.  As  an  alternative  to  sampling,  tne 
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Example  ■. 

Anti-Sample  1 
(e.g.  Ohions) 


Anti-Sample  2 
(e.g.  Ages  25-36) 


Population  P 
(e.g.  Ohions  Ages  25-36) 


Sample  S 

(e.g.  Ohions  Ages  24-36) 
(with  middle  SSN  digit  5) 


Figure  29.  Scunpling  vs.  Antisampling 


author  In  CRef,  34J  suggests  the  creation  of  a  mucn  s<naller 
database*  called  a  databa&fi  a&stsict*  which  is  a  collection 
of  simple  statistics,  such  as  means,  maxima,  etc.,  on 
Imoortant  and  frequently  asxed  data  In  the  datanase.  The 
database  abstract  preserves  most  of  the  statistical  content 
of  the  original  data.  In  oroer  to  compensate  for  the  above 
alsadvantages ,  This  database  abstract  is  the  malor  eiei'ent 
In  antlsampllnq. 

In  utlllzlno  this  abstract,  the  processing  soeeo  can  he 
traded  off  for  storage,  since  statistical  databases  often 
have  much  redundancy  in  attribute  values  ana  smc* 
these  statistics  can  be  predicted  by  other  attribute 
values  and  statistics,  they  can  be  comouted  by  pro'frams  on 
cheap  orocessors.  Instead  of  expensive  placing  on  secondarv 
storage,  A  number  of  "reasonable-ouess'*  rules  can  oe  used  to 
infer  statistical  characteristics  of  the  original  data  trc" 
the  abstract.  This  technique  provides  an  estimate  for  data 
in  the  Initial  stages  of  statistical  analysis, 
emohaslzlng  aulcx  and  rough  estimates  and  visual 
displays.  It  directed  towards  hvpotnesls  oeneraMon,  net 
hypothesis  testing. 

This  approach,  by  the  employment  of  a  dataoase  abstract 
Of  precomputeo  statistics  plus  Inference  rules,  overcomes 
each  of  the  abovementloned  oolnts  and  indicates  as  folib«s 


(1)  The  database  abstract  is  an  aqqregatlon. 

(2}  unce  set  up,  the  database  need  not  be  paged  at  all. 
Paging  of  the  abstract  Is  low,  since  It  is  -nuch 
smaller  than  the  full  database.  Also,  there  are 
usually  many  sets  of  statistics  relevant  to  a  query, 
hence  fewer  retrievals  are  necessary  than  the 
retrievals  for  the  same  query  without  the  statistics 
on  the  full  dataoase, 

(3)  Database  Index  pages  are  used  efficiently  fnr 
the  same  reasons. 

(41  Antlsamriina  handles  extremum  statistics  well  since 
it  can  use  extremum  statistics  of  the  entire 
database  as  bounds. 

(5)  i^any  rules  explicitly  address  such  cases  as 
extensions  of  a  set  to  supersets  and 
restrictions  of  a  set  to  subsets. 

(6)  Sets  in  the  database  abstract  have  an  exoliclt 
semantics . 

This  approach  orovides  a  new  alternative  to  samoiino  for 
exrlorlpq  a  large  data  poouiation  at  low  cost. 


B.  Ar  OVERVIEW  GF  THK  ANTI.SAMPLIMG  APPROACH 

This  top-down  aoproach  to  low-cost  estimations  of 
statistics  on  a  large  computer  database  consists  of  a 
precomputed  set  of  statistics  Known  as  a  oataoase  anstract 
and  a  set  of  inference  rules.  This  new  approach  starts  with 
a  user  and  a  database.  The  database  is  nreprocessea  tn 
create  a  database  abstract,  which  is  a  collection  of 
simple  statistics  (the  mean,  maximum,  mode  frequency,  etc.) 
on  important  and  frequently  asked-about  sets  in  the 
database.  The  user  interacts  with  an  interface  to  the 
database  abstract,  and  asks  the  same  statistical  questions 


that  he  would  asK  the  full  database#  If  he  had  more  time  or 
space.  Tf  an  answer  is  not  In  the  database  abstract, 
an  estimate  and  bounds  on  the  estimate  are  Inferred  for 
the  answer  from  rules. 

There  are  four  dimensions  to  this  rule  taxonomy  [F-ef. 
34].  These  are: 

(1)  The  statistical  dimension,  such  as  means  and  maxima, 

(2)  The  characteristic  dimension,  such  as  exact 
answers,  bounds  and  estimates. 

fl)  Tne  computational  dimension,  for  example,  wnat 
forms  of  queries. 

(4)  The  derivation  dimension,  for  example,  from  where 
the  results  derived.  For  an  example  of  tne  rule 
taxonomy,  see  Figure  30, 

Given  the  dlsadvantaaes  of  the  samplinq  aoproacn  ir 
would  appear  that  antlsampllng  approach  onens  up  a  nroad 
area  for  future  research,  Antisamplina  is  not  just 
another  sampling  method,  but  somethlno  tunoamenta  1 1  v 
different,  and  subject  to  quite  different  advantages  and 
disadvantages  than  sampllnq.  Althouqh  some  of  its 
advantages  nave  been  discussed,  one  disadvartaoe  tnat  nas 
not  been  mentioned  Is  tne  amount  of  details  tnat  remains  tr 
pe  worxed  out,  such  as  to  increase  the  number  o£  rules  and 
to  get  better  estimates. 

Some  new  directions  for  further  applications  have  oeen 
outlined,  as  well  as  extensions  of  this  technique  IRef,  34], 
Some  extensions  Include  rules  for  correlations,  causations, 
rules  for  Intenslonal  Knowledge,  rules  for  prototypes. 


Rule  ;  Largest  Item  in  the  Intersection 
of  the  Two  Sets  cannot  be  Larger 
than  the  Minima  of  the  Maxima 
of  the  Two  Sets 


Statistical  Dimension:  Rule  for  Max  Statistic 

Characteristic  Dimension:  Upper  Bound 

Computational  Dimension:  Intersection  of  the 

Sets 

Derivation  Dimension:  Basic  Mathematics 


Figure  30.  Example  of  the  Rule  Taxonomy 


dependencies,  quantifiers,  and  nulls.  More  sophisticated 
control  structure  can  be  readily  Implemented,  ioeclai- 
puroose  hardware  could  Improve  performance  of  the  system. 
For  example,  obtaining  tne  database  abstract  is 
computationally  expensive,  and  special  devices  for 
comnutlng  the  basic  aggregate  statistics  on  tne  database 
might  be  very  helpful,  perhaos  as  comoonents  in  ais< 
drives.  Such  devices  would  also  Improve  answer  sneed  for 
arbitrary  statistical  queries  on  the  database, 
Moreover,  since  paging  Is  a  major  cost  In  this  system,  ctio 
use  of  the  read-only  memory  for  the  database  abstract  ninnt 
significantly  Improve  its  performance.  This  could  be  auite 
cost  effective  for  much  used  databases  lixe  tne  u.S, 
Census, 

The  idea  of  orovlding  for  the  first  time  an 
alternative  to  sampling,  for  estimating  characteristics 
of  a  lame  data  population  at  low  cost.  Is  very  apDeaiing, 

C.  CUWRFHT  ABSTRACTION  METHCDS  FOR  CONTENT  ABSTRACTS 

The  current  abstraction  methods  for  content 
abstracts  utilize  coatcat  aaaXvsia  coacsBaa  (Saa.  AS].  IHeu 
aaa  auaaai-aaae&iaa,  caataat-asiaarad,  aiasaat  aatalaaai  and 
latacnatlaa  cafcaXaual  methods. 

The  content  analysis  is  a  process  of  delineating  *hat 
a  sentence  says.  Hence,  only  humans  can  Interpret  and 
fully  comprehend  the  meaning  of  a  natural  language  sentence 


that  may  be  incomplete.  Idiomatic  and  valid  only  In  the 
context  of  the  dialogical  communication.  Moreover,  only 
humans  can  appreciate  the  differences  in  the  wording  ot 
sentences,  or  can  easily  interpret  the  jaroon. 

In  using  this  tecnnloue,  the  body  of  text  that  is  to 
become  the  retrieval  database  consists  of  sentences,  a 
sentence  means  an  Enolish  sentence  or  partial  sentence  witn 
no  formal  restrictions  on  its  structure  or  oroani/.atior. 
Sentences  are  the  foundation  unit  of  which  tne  dataoase  is 
comoosed  because  they  are  the  basic  unit  of  tne  hunan 
communication. 

Using  the  symbol-matching  retrieval  tecnnloue,  the 
database  is  searched  to  locate  the  data  element  that 
contains  a  certain  symbol  or  a  sequence  of  symbols,  unce 
this  symbol  is  located,  the  data  element  may  oe  retrieve-) 
as  a  whole  or  may  oe  subject  to  manipulation.  An  example  of 
such  a  technique  is  the  keyword-indexing  technique  in  wnicn 
the  symbol  being  searched  for  is  an  tnqlish  word. 
However,  this  technique  is  not  considered  very  vlaolf, 
since  the  data  cannot  be  counted  on  to  contain  appropriate 
key  symbols  for  Indexing,  as  in  tne  case  "Itn  the 
example,  "Hit  the  deck", 

with  content-oriented  retrieval,  a  data  element  is 
Identified  according  to  its  content  or  meaning,  ratner  than 
by  the  symbols  or  keywords.  Too  often,  natural  *'noilsn 
sentences  may  not  be  complete,  such  that  a  pronoun  is  used 


Instead  of  a  noun,  which  may  be  Implied.  For  examole, 
"He  was  not  able  to  continue",  is  a  sentence  that  nas  no 
key  symbols,  and  hence  only  content-oriented  retrieval  would 
be  appropriate.  Moreover,  either  the  meaning  of  sentences 
is  conveyed  by  the  style;  or  the  comoosltlon  is  frequently 
more  imoortant  than  the  meaning  of  individual  component 
elements.  Considering  the  sentence  "Time  files",  tne 
meaning  Is  entirely  different  when  each  word  is 
considered  separately  rather  than  jointly.  The  preferred 
approach  is  whenever  a  content-oriented  technique  is  used, 
a  symbol-  matching  techniaue  is  also  used  in  conjunction 
with  It, 

The  content  abstraction  encompasses  the  following 
two  types  of  retrieval!  element  and  information  retrieval. 
The  element  retrieval  returns  the  entire  data  element 
satisfying  the  query,  whereas  Information  retrieval  returns 
only  the  answer  extracted  from  the  data  element.  For 
example,  giving  a  database  that  Includes  tge  following 
sentence!  "Company  X  will  conduct  a  reconnaissance  at  or, or, 
hours",  and  the  considering  the  query,  "what  is  Company  x 
doing  at  0600  hours?"  An  element  retrieval  .system  would 
retrieve  tne  entire  sentence,  while  an  Information 
retrieval  system  would  only  reply  with  "reconnaissance", 
Cither  retrieval  technique  is  acceptable,  depending  on 
the  output  desired.  The  contents  analysis  tecnnigue  Is 
performed  on  each  sentence  by  filling  in  a  standard 


abstract  form  (see  Flqure  31),  The  technique  oeglns  bv 
assigning  an  Identification  number  to  each  sentence.  Then, 
the  content  of  each  sentence  Is  analyzed;  all  aopllcanle 
properties  are  checiced.  These  five  properties  are  [^ef,  35]: 

(1)  Type  -  whether  It  Is  declarative,  Interrogative, 
Imperative,  resoonslve,  exclamatory,  and 
acKnowiedqment,  Type  Is  anpiicable  to  every 
sentence, 

(2)  nature  -  whether  it  is  a  requisition,  conclusion, 
characteristic,  valuation,  nr  recommendation, 
nature  Is  not  applicable  to  every  sentence, 

(3)  Tone  -  whether  It  Is  affirmative,  uncertain,  nr 
negative.  Tone  Is  applicable  to  every  sentence, 

(4)  tense  -  whetner  It  Is  past,  present,  or  future, 

A  tense  is  applicable  to  every  sentence, 

(5)  Mode  -  whether  it  Is  altering  the  characteristics  of 
the  thlnos  that  the  sentence  talics  about,  such  as 
ooliqatlon.  Intention,  permission,  ability,  rls'«:, 
and  desire;  occurring  with  certain  orobauliity 

and  having  a  finite  duration  such  as  at  the 
beginning,  somewhere  In  process,  or  terminating, 
hode  Is  not  applicable  to  every  sentence, 

•^ode  alterations,  also  Known  as  transformations,  are 

Identified  oy  denoting  ever'^  ntity  mentioned  nr  impileq  tn 

tne  particular  sentence  with  Its  Identification 

number  entered  In  the  appropriate  blank.  All  attributes 

(characteristics)  applying  to  that  entity  are  cbecnea. 

Although  this  abstraction  technique  appears  to  cover 

the  broad  range  of  meaning  that  can  be  contained  In  an 

English  sentence,  there  are,  however,  numerous 

constraints  with  this  approach.  If  an  extremely  aetalled 

abstraction  scheme  Is  utilized,  that  accurately 
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reflected  all  or  nost  of  the  subtlety  and  complexity  of  the 


natural  Fnqllsn  sentences,  an  extensive  training  of  tne 
great  Intelligence  for  the  analyst  would  be  required.  If,  on 
the  other  hand,  a  sentence  analysis  scheme  simoie  enough 
for  the  low-level  personnel  Is  used.  It  could  be  too 
Inaccurate,  Moreover,  this  schema  Is  difficult  to  deal  wi.tn 
sentences  that  have  word  usage  errors.  Also,  semantic 
interpretation  depends  on  how  gooa  tne  Input  analysis  Is, 
and  contents  abstracts  do  require  human  intervention, 

Tnls  technique  can  be  enhanced  oy  extenrtlm  .soite  of  its 
capabilities.  First  of  all,  the  abstraction  form  can  do 
readily  automated.  Secondly,  an  Inferential  retrieval 
technique  can  be  utilized  when  an  answer  to  a  uuerv  is 
not  directly  contained  in  the  database.  For  example,  if  a 
database  contained  the  sentences,  "Comoany  X  is  a  cart  nf 
battalion  Y",  and  "Battalion  Y  Is  on  maneuvers  In 
Honduras",  and  the  query  has  been,  "wnere  is  Comoany  x 
?",  a  sophisticated  element  retrieval  system  should  ye 
able  to  return  tne  two  sentences  concernlnq  Comnanv  x 
and  Battalion  Y,  thereby  enabling  the  Inference  that 
Comnany  X  Is  probably  in  Honduras,  Thirdly,  the 
wnole  process  can  be  automated,  thereby  saving  time. 


cost,  and  effort 


f),  the  automatic  data  abstracting 

Despite  recent  advances  in  technolooy#  ^ign  nualltv 
abstracts  must  still  be  produced  manually.  Consequently, 
the  preparation  of  abstracts  and  their  associated  Indices 
accounts  for  over  half  of  the  cost  and  time  required  tor 
Publication,  Thus  one  must  taice  into  account  the  factors 
of  production  time  and  cost  when  comparing  the  xanuat 
with  the  mechanized  abstractlno  methods. 

Two  alternative  solutions  exist  to  overcome  the  problems 
associated  with  the  manual  abstract  production: 

(1)  Use  author-prepared  aostracts  as  a  prerequisite 
to  publication, 

(2)  Mechanize  the  abstract  production. 

Abstracts  from  authors,  although  editors  and  puoilsners 
have  In  recent  years  made  an  effort  to  get  good  aostracts 
from  their  authors,  are  of  questionable  value.  Thus, 
together  with  the  increasing  shortaoe  of  oueiifled 
abstractors,  tne  factors  of  time,  cost,  and  value  have 
lent  impetus  to  a  trend  toward  tne  automatic  qeneratlon 
Of  abstracts  and  Indices,  This  trend  has  caused  Increased 
emphasis  to  be  placed  on  the  abstract  as  the  locus  of  oata 
for  automatic  retrieval  systems.  This,  of  course, 
necessitates  the  creation  of  hlon-quallty  abstracts, 

1 .  Sustftm  aftsulKaaeats 

Some  of  the  basic  requirements  which  an  automatic 
abstracting  system  must  fulfill.  Include  tne  unit  of  oata 


to  be  processed*  methods  of  sentence  selection*  notions 
of  contextual  Inference*  Intersentence  reference  an'i 
coherence  criteria  [Kef,  363, 

a.  The  Basic  Unit  of  Data 

A  languaoe  processing  program*  for  efficiency* 
Should  consider  the  largest  indeoendent  Item  in  its 
database  as  Its  basic  unit.  Thus*  In  automatic 

abstracting  the  basic  unit  is  the  original  article,  it 
would  be  Inadeouate  to  consider  any  approach  In  wnlcn  eimer 
Paragraphs  or  sentences  are  considered  basic  units* 

because  of  the  Interdependence  between  these  elements  end 
tne  remainder  of  an  article,  a  program  wnicn 
operates  on  Interdependent  units  bears  the  burden  of 

carrying  data  from  one  unit  to  the  next.  An  automatic 
language  processing  program  must  also  oe  able  to  Identltv 
and  manipulate  the  elements  of  Its  basic  date  unit*  whether 
these  elements  be  words,  phrases*  clauses*  or  sentences, 
o.  Sen tence-Se lection  Methods 

In  order  to  develop  criteria  for  seiectlna 
sentences  to  form  an  abstract*  it  Is  necessary  to  analyze 
tne- conditions  under  which  various  methods  of  sentence 
selection  are  successful,  it  Is  apparent*  no*ever* 
that  an  abstract  can  also  be  produced  by  relectlno 
sentences  of  the  original  which  are  irrelevant  to  the 
abstract*  Therefore*  it  is  no  wonder  that  methods  of 
rejecting  sentences  also  deserve  Intensive  analysis. 


c.  Notions  of  Contextual  Inference 


Coataxtual  la£e£fi&CA  is  the  basic  concert 
underlylnq  sentence  selection  or  rejection,  Tnus,  olven  a 
data  element  (word  or  «»ord  string)  i*ltnln  a  sentence  and 
some  surrounding  context,  using  contextual  Inference,  it 
Is  generally  possible  to  infer  whether  a  sentence  snoviid 
be  rejected  or  selected  for  Inclusion  in  an  abstract. 
Contextual  Inferences  may  be  made  based  on  eitner  tne 


Physical  arranqement  of  the  elements  of  a  document,  called 
the  XficatlAB  method,  or  on  word  strings  wnlch  comnrlse 
these  elements,  called  the  cu&  method.  These  two  basic 
approaches  to  the  making  of  contextual  Inferences  are 
discussed  as  follows: 


fl)  I&a  LoeatXafi  The  location  method 
Is  based  on  the  physical  arrangement  of  the  elements  of 
an  article.  This  arrangement  can  be  descrlbeo  In  terms  o^ 
the  location  of  a  sentence  with  respect  to  how  lonn  tne 
document  containing  the  sentence  Is,  or  in  terms  of  tne 
location  of  phrases,  clauses  or  words  with  resnect  to 
how  long  the  sentence,  containing  tnese  elements  is. 

The  first  location  type,  called  sentence 
location,  is  governed  by  the  style  of  tne  autnnr  or  the 
editor,  with  general  writing  guides  orovldlna  advice 
about  the  placement  of  sentences  within  an  article,  Since 
It  Is  not  possible  to  dictate  the  matter  of  tne 
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style. 


the  location  of  a  sentence  does  not  convey  an 


uneaulvocal  criterion  for  sentence  selection  or  rejection. 

The  second  location  type  Is  actually  a 
sentence  description,  since  the  location  of  Phrases  and 
words  within  the  sentence  Is  subject  to  oramtnatlcal  rules 
to  which  authors  and  editors  adhere. 

Since  a  sentence  Is  a  string  ot  words 
terminated  by  a  period,  question  mark,  or  seTlcnion,  tne 
punctuation,  hence,  plays  an  linnortant  role  in  me 
location  method.  Each  punctuation  marK  serves  a 
specific  purpose. 

Both  tne  ouestlon  mark  and  the  semicolon 
nave  a  rather  unamolauous  use;  the  period,  however,  is  use'* 
In  abbreviations,  and  numbers,  as  well  as  at  tne  enn  ot  a 
sentence#  These  different  usaoes  must  be  differentiated 
In  order  to  properly  analyze  sentences# 

Commas,  like  periods,  can  also  have  several 
uses.  They  can  separate  items  In  series,  r arentneticd  1 


expressions 

,  and  clauses,  as 

well 

as  occurriny  m 

numbers . 

Serial  or 
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make 
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determinat 
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either  rejection  or  acceptance#  Parenthetical  commas,  on 


the  other  hand,  since 

they 

normally  merely 

eiuc 

laate,  can 

be  rejected. 

because 

they 

can  be  removed  without 

af fectlnq 

the  meaning 

of  tne 

sentence#  Commas 

that 

separate 

Clauses 

are 

more 

important 

In 

this 

rejection/acceptance  scheme,  since  they  delimit  the 
leading  clause  from  clauses  i»nich  qualify  It.  Secon<^ 
or  subsequent  clauses  generally  modify  the  first  clause, 
thereby  concluding  that  the  first  sentence  is  essential 
to  the  meaning  of  the  sentence.  However,  second  and 
subsequent  clauses  can  be  rejected,  and  still  obtain  a 
sensible  result,  For  example,  the  sentence,  "  Company  x  Is 
the  best  fighting  unit  in  the  Division,  pecause  Its 
commander  is  Italian",  is  Just  as  grammatically  correct 
and  coherent.  If  its  subordinate  clause  are  removen 
(rejected),  with  the  result,  "Comoany  X  Is  the  rest 
flghtlPd  unit  in  the  Division".  Of  course,  this 
rejection  depends  entirely  on  the  abstract  desired, 

wnen  dealing  with  clauses,  further 
reductions  can  he  made  by  removino  prepositional  phrases. 
Thus,  the  sentence,  "Company  x  is  the  best  tlgntirin 
unit",  perfectiv  expresses  the  orlainal  meanino, 
depending,  of  course,  on  tne  abstract  desired, 

Thtis,  the  location  metnocl,  ehich  Is  rased 
on  tbe  general  nvoothesls  that  certain  neadings  precede 
important  passanes  and  tnat  topic  sentences  occur  early  or 
late  in  a  paragrapn,  is  based  on  contextual  inference. 
Punctuation,  words,  phrases  and  clauses  are  sentence 
elements  whose  context  can  be  used  to  infer  whether  a 
sentence  has  value  for  an  abstract  or  not  (Ref,  36], 
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(2)  Zb«  Cue  Metbed*  The  method  ot  a 
sentence  selection  or  rejection  based  on  cue  eaede  Is 
called  the  cue  method.  Cue  words  are  words  tnat 
provide  unambiguous  clues  to  such  things  as  opinion  an-i 
subjectivity,  as  well  as  to  some  positive  notions.  These 
cue  words  are  normally  contained  in  the  dictionary,  aiona 
with  codes,  which  Indicate  the  frequency  ot  occurrence 
within  a  context  (see  Figure  32), 

The  Cue  method  provides  a  powerful  approach 
to  sentence  selection  or  rejection.  The  metnod  depends  on 
the  fact  tnat  It  is  possible  to  decide  what  snouln  or 
Should  not  he  included  in  an  abstract,  based  upon  tne 
presence  in  the  original  article  of  particular  words  or 
combinations  of  words.  For  erample,  words  that  *av 
indicate  tne  purpose  of  a  document,  such  as,  "My  thesis", 
is  an  excellent  candidate  for  acceotance.  ooinlons  and 
subjective  notions,  which  should  not  oe  included  In  an 
abstract,  can  be  Identified  by  such  cue  words  as 
"obvious,  or  believe",  /Moreover,  the  code  of  a  cue  word  ;nay 
depend  on  its  position  In  a  sentence,  A  sentence 
starting  with  "A"  or  "Some"  Is  more  llxely  to  oresent. 
detailed  descriptions  than  a  sentence  which  contains  either 
Of  these  words  In  a  more  central  location  of  the  sentence. 
The  reason  for  tnis  is  tnat  these  words  nave  a 
strong  quantitative  function  when  they  appear  at  the 
beginning  of  a  sentence.  Similarly,  sentences  which 
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beqin  with  participles  are  usiiallv  conditional  In 
nature,  indlcatlnq  assumotlons  or  conjectures. 

Also  cue  words  may  be  used  to  Identify 
parenthetical  expressions,  idiomatic  expressions  and 
cliches  as  well  as  to  carry  syntactic  roles  In  cases  wnere 
there  Is  no  amblqulty.  Thus  do  cue  woros  maxe  it  easier  to 
detern'lne  whether  a  sentence  or  phrase  should  pe  selected 
for  the  abstract  [Ref,  3hl, 

d.  Intersentence  Reference 

Intersentence  references  nlve  quite  a  hit  of 
Information  about  the  loqlcai  relationships  within  the  text 
material,  by  the  use  of  multiple  clauses,  cue  words,  and 
title  words.  When  more  than  one  clause  exists  In  a 
sentence,  the  first  clause  is  Indispensable  to  the  meanlnq 
of  the  sentence,  and  generally  the  first  clause  will  also 
contain  Intersentence  references  If  there  Is  any,  words  in 
the  second  and  subsequent  clauses  which  require  antecedents 
usually  refer  to  the  first  clause.  Some  cue  words  tnat 
indicate  an  Intersentence  references  are  "  these,  they,  and 
It",  Now,  wnen  these  words  have  multiple  uses,  adaitlonal 
criteria  are  then  required  In  order  to  determine  if  there  Is 
an  Intersentence  reference,  such  criteria  includes 
discoverlnq  patterns  In  the  use  of  words  sucn  as  "It"  to 
make  possible  the  use  of  these  words  to  detect 
Intersentence  references.  The  followina  loolcai  rule 


could  emerge:  "It”  In  the  first  or  only  clause  indicates 


Intersentence  reference  unless  followed  closely  by  "that" 


For  example,  the  sentence,  "It  is  known  that  Saint  i^ick 
represents  Santa  Clause",  does  not  refer  to  a  previous 
sentence,  while  the  sentence,  "Tt  was  still  roiiim", 
does .  Of  course,  there  will  always  be  exceptions  to  this 
rule. 

There  are,  however.  Intersentence  references 
that  do  not  maxe  use  of  any  cue  words;  instead,  they  use 
the  name  of  the  antecedent  rather  than  a  pronoun, 
Tntersentence  references  of  this  type  can  be  detecteo  ov 
the  title  words.  If  any  words,  which  are  defined  as  non- 
function  words  such  as  those  words  which  are  not  articles, 
conjunctions,  preoosltlons  etc,,  occur  In  adjacent 
sentences,  the  sentences  are  likely  to  he  closelv 
related,  An  example  would  be  the  sentences,  "Hydrouen  am 
oxyqen  form  water"  and  "Water  Is  colorless". 

This  method  of  selectlna  sentences  hy  title 
words  Is  a  special  type  of  Intersentence  references  that  Is 
between  the  title  and  the  rest  of  the  sentences  in  a 
document.  The  title  method  has  the  premise  that  tne 
author  oenerally  describes  in  as  few  woros  as  possloie  tre 
essence  of  his  paper.  Thus,  it  can  be  safely  assumed  that 
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the  words  of  the  title  are  well  chosen  and  of  high 
significance  [Ref,  36], 

2.  Xba  CabaKftuca  Caasidftsatioa 

In  determining  sentence  selection  or  rejection,  the 
coherence  of  the  abstract  must  be  considered,  For  example, 
If  there  Is  a  sentence  that  required  an  antecedent  tn 
be  Included  In  an  abstract.  It  would  be  necessary  to 
determine  If  the  previous  sentence  nas  been  renoveci  an  i 
if  necessary  to  reinstate  It,  If  the  restored  sentenc** 
also  requires  an  antecedent,  the  procedure  must  oe 
repeated.  But,  If  several  sentences  woulo  nave  to  oe 
reinstated  because  of  the  required  antecedents  of  one 
sentence,  then  that  sentence  might  as  well  he  rejected, 

3,  laa  CaaXiouaaaioa 

The  automatic  abstracting  system  consists  basically 
Of  a  dictionary,  called  tae  bead  Caabsal  Ll&t  C^ci.),  and  a 
set  of  rules  for  Implementlnq  certain  functions  specified 
for  each  WCL  entry.  To  automatically  produce  abstracts. 
It  Is  necessary  to  identify  and  eliminate  certain 
sentences  of  the  document.  It  is  also  necessarv  to 
Identify  and  select  a  tew  sentences  for  tne  abstract,  and 
to  retain,  by  default,  as  well,  certain  sentences  tor  toe 
abstract  (see  Flaure  33),  These  three  methods  of 


sentence  handling  are  discussed  below 


a.  The  Sentenclal  Elimination 


The  exclusion  of  sentences  from  the 
abstract  involves  the  detection  of  words  or  strlnas  of 
words  which  identify  sentences  glvlnq  historical  data, 
results  of  previous  work,  examples,  explanations, 
speculative  material  and  so  on,  A  set  of  word  strincjs 
needed  Is  In  the  order  of  a  few  hundred  In  order  to 
eliminate  up  to  of  the  sentences  of  a  oocument.  Suco 
word  strlnas  are  Incorporated  in  wCL, 

WCL  consists  of  an  alohabetical ly  ordered  set  of 
words  and  phrases,  which  are  referred  to  collectively  as 
ttasd  st£loo&>  and  one  or  two  associated  codes  (see  Flaure 
33),  The  entries  in  WCL  are  treated  as  functions  and  eaco 
has  two  arguments:  a  semantic  weiaht  (see  (•laure  3Ja)  and  a 
syntactic  value  (see  Figure  33b),  Each  function  returns  a 
value  which  indicates  whether  the  sentence  Is  a 
candidate  for  retention  or  deletion.  Entries  in  ^Ci  -"av 
be  varied  as  desired  without  necessltatlno  any  cnames 
In  the  programs  of  the  system,  in  general  a  wCi-  entrv  can 
he  represented  as:  word  Strlno  multlolleg  by  welont 
multiplied  by  Value, 

b.  The  Sentential  Retention 

In  general,  the  semantic  weight  of  a  wcb  entrv 
can  be  either  positive  or  negative,  wcl  entries,  whlen  nave 
positive  semantic  welgnts,  are  retained;  otherwise,  they  are 
rejected.  Such  word  strings  in  a  document  as,  "this  oaoer". 


"this  study*,  or  "present  wortc",  are  retained.  It  Is  nearly 
certain  that  the  author  of  such  a  document  Is  about  to  say 
what  the  document  Is  all  about,  and  should,  therefore,  be 
retained,  because  it  does  oelonq  to  the  abstract. 

In  addition,  sentences  which  contain  personal 
Pronouns  such  as  "we",  "I",  or  "our",  are  good  candidates 
for  retention.  Finally,  sentences  which  contain  slanltlcant 
title  words,  sucn  as  words  of  the  title  which  are  not  in 
WCL,  and  which  do  not  contain  word  strings  havlnn  stronniv 
negative  semantic  welqnts,  should  also  be  retained,  Tnere 
are  no  other  Instances  in  which  a  sentence  is  deliberately 
retained,  although  a  small  number  of  sentences  of  me 
document  belonging  to  the  abstract  win  t-e  retained  tv 
default, 

c,  Rules  for  Implementing  the  Functions  Jn  wCL 

A  viable  solution  to  determine  wnether  a 
sentence  of  the  document  is  a  member  of  the  anstr'sct  nr  not, 
using  a  two-valued  memoershlp  criterion,  is  to  Impose  an 
ordering  on  the  semantic  weiahts  [23],  This  oroerinu 
provides  considerable  flexibility  by  incorporating  tr.e 
rules  in  the  program  and  by  supplying  the  semantic  weights 
externally,  Tne  rules  can  be  altered  without  the  necessity 
for  Changing  wCL,  and  WCI.  can  oe  altered  independently  of 
the  rules.  There  are  a  total  of  19  rules. 

Since  tne  implementation  of  the  semantic 
weights  requires  seme  syntactic  information,  a  partial 
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Figure  33.  Word  Control  List 
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U«ed  tor  very  ponuve  tenxu:  thp«»  which 
most  unequivocaily  indicate  somethinc  of  im¬ 
portance  (e^^  our  work) 

A  Aanned  to  ver>'  nemtive  terma:  terms  which 
do  not  belong  in  an  abetnet  (e«^  obvious,  pre- 
viouaiv) 

K  Asncned  to  terms  which  are  related  to  items  of 
positive  data  content  (e,g^  important) 

B  Parenthetical  erpreanons.  terma  of  low  data 
content,  or  terms  watch  are  associated  with 
Items  of  low  data  content  (e^..  however) 

E  Used  for  mtenainers  and  determmen  (e,s-. 
manv,  more) 

L  Introductory*  qualihera  feat-  once,  a) 

C  Used  for  words  which  require  an  antecedent 
(  eat.,  this,  these) 

H  Terms  which  introduce  a  modifying  phrase  or 
clause  ( eat.,  whose) 

F  >uil  laaaiened  to  abbreviations) 

G  .Aseiened  by  the  procraxn  to  indicate  interseo- 
tence  relationships  or  relation  of  sentence  to 
title 

J  Continuation  of  a  semantic  code  aseigned  pre¬ 
viously 

D  D»ie*e  a  word  (can  be  used  with  any  arbitrary 
WCL  entry) 


Figure  33a,  Semantic  Attributes  For  WCL  Entries 
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X  F.xriusivriv  asaicned  to  IS.  ARE.  WAS.  and 

WERE 

2  Nrcativrs 


Figure  33b.  Syntactic  Values  for  WCL  Entries 


syntactic  analysis  of  each  sentence  Is  perfor>ned.  This 
analysis  Is  carried  out  throuqn  the  use  of  tne  syntactic 
values  associated  with  entries  In  the  'tfCL,  In  conjunction 
with  procedures  Implemented  within  the  proqram.  One  of  ten 
possible  syntactic  values  may  be  associated  with  an  entry  in 
wCL.  These  syntactic  values  distinguish  auxiliary  verts 
within  verbs  and  "is",  "are",  "was"  and  "were"  are 
dlstinquished  from  other  auxiliaries,  Sitriiariy,  the 
preposition  "to",  "as"  and  "of"  are  distinguished  troT 
eacn  other  and  from  other  prepositions  tRef,  37], 

4,  Bata  Sttuetutat  tat  tutaaatic  Atattactiaa 

In  Implementing  the  data  structures  some  features  of 
lists,  defined  as  pointers  to  the  data,  and  some  features  of 
tables,  defined  as  storage  of  word  attributes,  are 
Incorporated,  Data  structures  consists  of  tne  following 
three  (see  Figure  34): 

(1)  A  work  area,  where  the  text  Is  stored 
throughout  the  processing; 

(2)  An  attrlDUte  vector,  containing  pointers  to  each 
word  of  the  text,  ana  tne  length,  semantic  and 
syntactic  attributes  for  the  corresoondlng  words. 
Textual  properties  sucn  as  the  capitalization 
could  also  be  Incoroorated  In  the  attribute 
vector.  The  nth  element  of  tne  attribute  vector 
corresponds  to  the  nth  word  of  the  text  in  the 
work  area; 

(3)  An  alpnaoetie  vector,  which  defines  the 
alphabetic  rank  of  the  words  of  the  text. 

The  nth  element  of  the  alohabetlc  vector 
contains  the  numoer  of  the  attribute-vector 
element  whlen  corresponds  to  the  nth  word 


In  alphabetic  sequence.  The  alphabetic 
vector  permits  matchlno  aoalnst  a  dictionary 
iNltnout  reorqanlzinq  the  data. 

The  attribute  and  alpnaoetlc  vectors  are  combined 


-  -1 


to  form  a  table)  It  is  possible  to  reuse  the  space  of  tne 
alphabetic  vector  after  all  alphabetic  nrocesslrq  nas  taKen 
place,  hence  combining  the  vector  results  in  an  overall 
soace  savings. 

5.  Caficlusiaas 

The  abstracts  that  have  been  obtained  with  this 
system  have  been  of  sufficiently  good  quality.  'Results 
have  demonstrated  that  a  80%  to  90  %  redtiction  of  text  is 
Obtained,  and  at  costs  comparable  to  the  cost  of  those 
produced  manually  CRef.  37].  Furthermore,  one  of  tne  mam 
advantage  of  this  automatic  system  Is  that  the 
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processing  programs  are  text-indeoendent.  Thus,  abstracts 
eritten  in  any  language  with  a  similar  structure  to 
English'  could  be  easily  obtained,  simply  by  substituting 
appropriate  wCL.  Another  advantage  Is  that  abstracts  i*lth  n 
particular  bias  can  be  produced,  slmplv  by  variation  of 
the  wci..  Thus,  tailor-made  abstracts  can  be  rroducpd. 

Research  Into  automatic  abstracting  continues  to 
be  attractive  because  of  the  following  factors) 

(1)  Manual  abstracting  Is  expensive  and  time 
consuming; 


(2)  Machine  readable  journals  will  probably  become 
more  widely  available  in  the  near  future  due  to 
the  Increasing  use  of  computer  controlled 
composition  in  the  printing  industry.  Tnls 


CONTEriTS  OF  WORK  AREA 


MACHINE  ADDRESS:  0  4  10  14  18  23  31  39 

II  I  I  I  *  «  I  ) 

TEXT:  The  rocks  did  not  have  sharply  angular  comers. 


TABLE  CORRESPONDING  TO  THE  WORK  AREA 


Implied  Attribute  vector  Alphabetic 

vector  vector 

element  word  word  attributes 

nurcer  length _ address 


will  orovlde  a  relatively  cheap  and  convenient 
database  for  experimentation; 

(3)  Given  the  availability  of  machine  readable 
journals^  automatic  abstracting  win  be  much 
cheaper  and  faster  than  manual  aostractlng; 

(4)  Automatic  abstracting  produces  abstracts  in  the 
machine-readable  form  ready  to  be  used  for  later 
processing  steps; 

(5)  while  automatic  abstracts  may  never  be  as  good 
as  manual  abstracts,  they  are  good  enouon  for 
practical  purposes. 

E,  PTdER  AnsrRACTI'IG  METHODS 

Other  specific  experimental  automatic  aostractlnn 

systems  Include  the  four  following  systems. 

The  first  is  ADAM  (The  Automatic  Document  Anstractinn 

Method),  which  has  been  designed  to  produce  indicative 

abstracts,  l.e.,  abstracts  which  enable  the  reader  to  iudoe 

Whether  or  not  one  needs  to  read  the  original  document,  '•ost 

automatic  aostractlno  methods  differ  from  adam  in  t«o 

important  respects;  they  rely  heavily  on  statistical 

criteria  as  a  basis  for  sentence  selection  and  rejection, 

and  are  designed  to  select  sentences  for  abstracts'.  In 

contrast,  ADAM  uses  statistical  data  only  oeriPneraiiy  and 

Is  designed  for  sentence  rejection  rather  than  selection. 

The  results  of  this  experiment  are  as  follows; 

(1)  The  duality  of  ADAM  extracts,  while  lower  than  that 
of  good  manual  aestracts.  Is  functionally  adeouate; 

(2)  ADAM  requires,  on  the  average,  0,6  sec  of  computer 
time  per  document; 

(3)  ADAM  needs  a  specialized  WCL  for  each  subject  area; 


(4)  ADAM  abstracts  can  be  improved  by  simple  manual  editing 
[Ref,  38]. 

The  second  Is  asIsa  (The  Analysis  of  iiemantlc 
Information  Structure  In  the  Abstract),  which  Is  a  method 
to  extract  significant  phrases  In  the  title  and  tne 

abstract  of  scientific  or  technical  document,  Tne  Tietnod 
is  based  upon  a  text  structure  analysis  and  uses  a 

relatively  small  dictionary.  The  dictionary  nas  teen 
constructed  based  on  the  knowledge  about  concepts  in  tne 

field  of  science  or  technology  and  some  lexical  knowledge, 
for  significant  phrases  and  their  component  items  may  ne 

used  in  different  meanings  amona  the  fields,  A  text 
analysis  approach  has  been  applied  to  select  significant 
phrases  as  substantial  and  semantic  information 
carriers  of  the  contents  of  the  abstract. 

This  system  consists  of  five  modules  (see  Pigure  35): 

(1)  Text  Input  Module,  which  reads  In  a  card  size  record 
at  a  time  and  decides  whether  It  is  retained  nr  not 
according  to  both  a  sign  of  the  first  column  of  tne 
record  and  its  oreceolrg  record  sign.  The. records  of 
tne  title  and  aostract  are  concatenated  wltn  eaen  other 
to  form  a  character  string  as  a  whole.  Then  the 
character  string  Is  transferred  to  the  next  moaule,  me 
records  of  the  keywords  set  are  resolved  Into  a 
collection  of  keywords  and  stored  In  the  memory  to  he 
compared  with  the  extracted  Phrases,  later,  me  other 
records  are  sent  to  the  output  module  without  any 
nrocesslng; 

(2)  Term  Extraction  Module,  in  which  the  term  as  a 
candidate  of  the  meaningful  item  is  extracted 
from  the  character  string  of  the  title  or  every 
sentence  of  the  abstract  by  dividing  it  with  the 
dellmiterst  Thus,  the  term  obtained  by  this 
process  is  recognized  as  a  meaningful  item  and 
transferred  to  tne  next  module; 


(3)  Term  Checking  Module,  which  consists  of  both  looking 
UP  the  dictionary  and  the  endlno  processing.  The 
dictionary  has  been  constructed  only  for  the  purpose 
of  extracting  meaningful  items  bases  upon  tne 
knowledge  about  concepts  of  the  field, 

(4)  Phrase  Generation  Module,  In  which  the  terms 
accepted  by  this  module  have  consequently  been 
classified  Into  one  of  the  following  four  Kinds: 
deletion  words  (D),  adjective  words  (A),  wea< 
noun  words  («),  and  strong  noun  words  CN),  Uslno 
these  sv’Tbois,  the  character  string  can  oe  viewed 
as  a  sequence  consisting  of  the  symbols 
corresponding  to  a  sequence  of  words  in  the  strina 
In  order, 

(5)  Output  Module,  which  has  two  different  kinds  of 
outputs  In  the  system,  that  Is  the  output  for 
every  document  and  output  for  a  set  of  documents. 

Following  are  some  of  the  results  of  this  particular 

method.  Significant  phrases  represented  In  the  abstract 

have  been  effectively  extracted  and  very  comoatioie 

with  the  author  prepared  keywords.  The  number  of  whole 

noun  phrases  extracted  from  the  abstract  Is  on  tne 

average  1,5  times  as  many  as  the  author  prepared 

Keywords,  and  the  title  is  not  an  adequate  source  for 

semantic  contents  analysis  of  the  document,  for  P0%  nt  the’i 

consists  of  1  to  2  words  fPef,  391, 

The  third  Is  SIE  (The  Specialized  Information 

extraction),  whose  tasK  it  Is  to  ortaln  Information 

automatically  from  a  natural  language  text,  In  whicn  some 

of  this  text  Is  of  a  highly  stylized  nature  wltf.  a 

restricted  semantic  domain,  and  Place  It  In  the 

database.  Specifically,  SIF  Is  designed  to  extract 

Information  regarding  chemical  reactions  from  experimental 


T«ict  Input 


Phr«««  G«n«ration 


sections  of  papers  In  the  chemical  literature 


an^  to 


produce  a  data  structure  containing  the  relevant 
Information, 

In  evaluating  this  system,  the  following  three  measures 
have  been  utilized: 

(1)  Robustness,  which  Is  the  percentage  of  inputs 
handled; 

(2)  Accuracy,  which  Is  the  percentage  of  those  inputs 
handled  which  are  correctly  handled; 

(31  error  rate,  whlcn  is  the  percentage  of  erroneous 
entries  within  Incorrectly  handled  input. 

The  results  are  as  follows:  the  robustness  was  ^2%,  but 
the  accuracy  was  only  781,  and  since  these  were  full  of 

errors,  no  error  rate  is  computed.  The  reason  for  this  Is 
that  the  most  difficult  aspect  of  SIE  Is  the  provisions 
of  a  safety  factor,  which  is  an  ability  for  tne  systeit  to 
recoonize  Inputs  that  It  cannot  handle.  It  Is  clear  that 
one  can  create  a  system  that  Is  robust  and  acceptanlv 

accurate  which  has  unacceptable  error  rates  for  certain 
inouts,  Tf  the  system  is  to  be  useful,  it  must  ne 
possible  automatically  to  determine  which  documents 
contain  unacceptable  error  rates.  If  tne  safety  factor  can 
be  improved,  SIE  offer  a  promising  area  of  abullcatior, 

SI£  programs  are  more  feasible  than  automatic 

translation  because  the  restrictions  has  lessen  the 
amoloulty  problems.  This  Is  true  even  In  comparison  to 
otner  taslcs  with  a  restricted  subject  matter,  such  as 
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messages  with  little  English  description  to  messages 
consisting  entirely  of  English  narratives.  Currently.  tne 
experimentation  is  limited  to  Mavy  operational  reports. 

The  narrative  portion  of  each  message  is  automatically 
transformed  into  a  series  of  format  entries  uslno  a 
procedure  wnich  Involves  tour  stages  of  processing: 
parsing,  syntactic  regularization,  mapping  into  the 
information  format,  and  format  normalization. 

First,  the  text  sentences  are  parsed  using  tne  oroag- 
coverage  string  grammar,  which  has  been  extended  to  nandie 
the  sentence  fragments  which  appear  in  these  messaoes. 

Second,  the  parse  trees  are  syntactically  reguiarizeo  ov 
a  series  of  transformations  in  order  to  slmpiity  tn" 
subseguent  mapping  into  the  information  format  (t,e,,  tne 
various  tyoes  of  clauses  such  as  passive,  relative,  sentence 
fragments  and  other  and  others  are  transformed  into  simple 
active  assertions). 

Third,  the  Phrases  in  the  syntactically  regularized 
parse  trees  are  moved,  one  by  one,  into  tne  Information 
format.  This  process  Is  controlled  in  large  part  oy 
the  semantic  word  classes  associated  with  each  word. 
These  classes,  along  with  syntactic  information  aoout  the 


Which  can  be  reconstructed  from  earlier  format  entries. 
Results  from  this  study  has  underlined  the  Importance  of 
continued  research  into  two  areas,  Knowieiae 
representation  and  robustness.  In  order  to  extand  this 
method  Into  more  diverse  applications  [Ref,  41], 

Currently,  the  most  promising  application  of  aii 
the  above  techniques  Is  in  the  extraction  of  lnfor‘"atlon 
from  the  hlohiy  restricted  semantic  domain  ot  specialized 
technical  journals. 


XI.  QAX&  ACCESS  ACQ  SEZEICEAl.  UCZSC&S 


An  Important  task  of  an  operating  system  is  tn^ 
maintenance  of  files.  Certain  facilities  are  provne-i  by  an 
ooerating  system  for  creating,  iestroylno,  oroanizlns, 
reading,  writing,  modifying,  moving,  copving,  and 
contrail ina  access  to  flies.  The  component  of  an  oparatina 
system  that  provides  these  facilities  Is  usually  referred  to 
as  a  file  system,  Jne  of  the  roles  that  this  file  svste” 
Plays  Is  an  Interface  between  a  program  and  the  flies  that 
the  program  expects  to  access.  Another  role  of  a  file  syste-^ 
Is  as  supervisor  that  monitors  files,  Tne  user  communicates 
Indirectly  with  a  file  system  via  an  operating  system 
through  a  set  of  predefined  commands  commonly  called  a  ion 
command  language,  assembly  language  orograms,  or  programs 
written  in  a  nlgn-level  programming  language,  A  nlgn-levei 
language  program  Indirectly  Invoices  an  accas^  a&AAo;!  via  a 
get,  put,  read,  or  write  statement.  The  execution  of 
these  statements  causes  an  access  metnod  routine  to  oe 
Invoked  that  performs  the  requested  Input/outnut  tt/0) 
ooeratlon  on  the  Indicated  file. 

Access  metnods  are  file-system  procedures  that 
Interpret  and  satisfy  user  requests  for  storage  and 
retrieval  of  data.  In  short,  they  are  the  "go-between" 


for  a  user  program  and  a  file*  They  can  handle  buffering 
(holding  data  In  the  main  memory)^  blocking  (placing  many 


records  into  one  block)*  and  deblocking  and  serve  as 
Interfaces  with  devices  (Ref*  41] * 

There  are  generally  many  access  methods  which  are 
provided  by  large  operating  systems.  These  are  sometifr.es 
grouped  into  two  categories*  namely  auauad  accfi&& 
xetbods  and  basic  access  cabdads. 

The  gueued  methods  orovlde  more  powerful  caoabtllftes 
than  the  basic  methods.  Queued  access  methods  are  used 
When  the  seouence  In  which  records  are  to  be  processed 
be  anticipated*  such  as  In  sequential  accessing 

(accessing  ordered  data).  The  gueued  metnods  oerform 
anticipatory  buffering  and  scheduling  of  l/o  operations. 
Tney  try  to  have  the  next  record  available  for  processing 
as  soon  as  the  nrevlous  record  has  been  processed,  More 
than  one  record  at  a  time  Is  maintained  In  the  primary 
storage.  This  allows  processing  and  i/O  to  oe 
overlapped.  The  queued  access  methods  also  perform 

automatic  blocking  and  deblocking. 

On  the  other  hand*  the  basic  access  methods  are 
normally  used  when  tne  sequence  in  which  records  are  to  oe 
processed  cannot  oe  anticipated*  particularly  with  direct 
accessing.  Also  there  are  many  situations  in  whlcn  user 
applications  want  to  control  record  accesses  without 


incurring  the  overhead  of  the  gueued  methods*  In  the  basic 


methods. 


unlilce  the  queued  methods. 


the 


user 


must 


perform  biocKlnq  and  deblocking  [Ref.  42}, 

During  the  last  decade  there  has  been  considerable 
Interest  in  file  structures  suitable  for  storing  larae 
dynamic  files  of  records.  Dynamic  means  that  records  can  be 
inserted  into  and  deleted  from  tne  file,  causing  tne  size  of 
the  file  to  vary.  In  a  static  file,  records  are  not  inserted 
or  deleted  and  attribute  values  arc  only  updated. 

The  file  structures  intended  for  retrieval  on  ortmarv 
key  can  be  dlvideo  into  ttko  classes:  those  based  on 
iadexlBa,  which  makes  use  of  key  fields  to  provide  access 
to  a  file,  and  those  based  on  BasBifia,  which  provides 
rapid  access  to  a  file. 

The  indexing  techniques  techniques  have  mostly  developed 
during  the  sixties  and  at  the  beginning  of  the  seventies, 
while  hashing  schemes  are  more  conventional,  '*ie  discuss 
hashing  first,  then  Indexing  next  in  the  following  sections, 

A,  THE  USE  OF  HASHING  FOR  DATA  ACCESS 

in  most  on-line  systems,  the  dominant  moae  of  file 
access  is  random,  as  Is  the  case  of  reservation  systems 
for  airlines,  hotels,  and  car  rentals,  and  information 
retrieval  systems,  for  libraries  and  stoc<  market 
quotations,  in  these  systems,  both  updatinq  and  retrieval 
are  accomplished  in  the  random  mode,  and  there  is  rarely  a 
need  for  sequential  access  to  the  data  records.  In  such 


applications^  a  hashed  file  organization  Is  often 
preferred.  A  hashed  file  organization  provides  raold 
access  to  Individual  records,  since  it  is  not  necessary  to 
search  Indexes.  However,  here  the  low  loading  factor  is 
traded  for  rapid  access.  In  other  words.  In  all  nashed 
files,  there  are  manv,  scattered  and  unused  file  spaces 
which  any  left  unloaded  with  file  data.  Consequently, 
hashed  files  take  more  secondary  storaoes  tnan  Inoexe'’ 
flies. 

The  oaslc  idea  nehind  a  hashed  access  file  organization 
is  that  the  records  of  a  file  are  divided  among  oucicets, 
each  of  which  consists  of  one  or  more  records  for  storage. 
The  major  components  (see  Figure  3f»)  associated  with  hashed 
flies  include  the  identifier,  transformation,  primary  and 
overflow  storage  areas.  The  primary  storaoe  area  is 
divided  Into  a  number  of  addressable  locations,  called 
bucltcts,  which  are  simply  physical  storaoe  blocxs,  Facn 
bucket  consists  of  one  or  nore  slots,  where  records  may  re 
stored.  Records  are  assigned  to  buckets  by  means  of  a 
Oastioa  soutloo,  which  is  an  algorithm  that  converts  each 
primary  key  value  Into  a  relative  disk  address  or  bucket 
number.  Ideally*  the  hashing  routine  that  is  chosen 
Should  distribute  the  records  as  uniformly  as  possible 
over  the  address  soace  to  oe  used.  This  Provides  the 
following  two  important  benefits!  first,  collisions,  wnich 
occur  When  two  or  more  records  are  assignee  to  tne  same 


bucleet,  are  minimized,  and  secondly,  file  space  is  utilized 
as  efficiently  as  possible.  Tne  record  is  stored  In  that 

bucket  if  there  is  an  empty  slot.  If  all  slots  in  tne 

bucket  are  full,  then  tne  record  is  stored  in  an  overflow 
area  of  the  bucket. 

One  hashing  algorithm  that  has  oeen  proposed  tnat 
consistently  performs  best  under  most  conditions  Is  tne 
dltflsloa/sefflaiades  aataad,  which  works  as  follows: 

(1) .  First,  the  number  of  buckets  to  be  allocated 

to  the  file  must  be  determined,  and  a  prime 
number  that  is  approximately  equal  to 

this  number  Is  selected; 

(2) .  Secondly,  each  primary  key  value  is  divided  by 

the  prime  number,  and  the  remainder  Is  used  as 
the  relative  bucket  address. 

To  retrieve  a  record  in  a  hashed  file,  the  hasnlnq 
algorithm  is  applied  to  the  primary  key  value  to  calculate 
the  relative  bucket  address.  If  tne  record  Is  located  at 
its  home  address,  then  only  one  disk  access  Is  required,  if 
It  is  in  an  overflow  area,  then  two  or  more  accesses  are 
required.  The  number  of  accesses  per  record  or  average 
search  lennth  is  computed  as  follows:  averaqe  search 

lenoth  a  (number  of  records  ♦  number  of  disk  accesses)  / 
number  of  records  (Ref.  43). 

'^ost  files  systems  today  support  hashed  files.  In  fact, 
the  ease  of  using  hashed  flies,  of  which  there  are  six 
types  to  be  discussed.  Is  one  of  the  major  advantages  of  a 
file  system.  These  six  types  are  llaaac  aa&a;Lao,  dvaaaic 


bMtaaa#  lXa«a£  baablaa  aastlal  aaaaaaiaaa, 
iatasaalatlaa  basbiaa*  aataadXble  baablaa#  and  caalasceb 
baabXaa. 

1 .  Xba  LXaaas  Uasbiaa 

The  starting  point  for  linear  hashing  is  a 
traditional  hash  file  where  overflow  records  are  nandleH 
by  bucket  chaining,  which  Is  the  method  where  overflow 
records  are  stored  by  linking  one  or  more  overflo-j 
buckets  from  a  separate  storage  area  to  an 
overflowing  bucket.  Each  overflowing  bucket  has  its  own 
separate  chain  of  overflow  buckets. 

An  Inherent  characteristic  of  hashing  teennlgues  is 
that  higher  storage  utilization  results  in  increased  seorcn 
lengths,  both  for  successful  and  unsuccessful  searches,  if 
the  search  performance  of  a  growing  file  is  to  remain 
within  acceptable  limits,  additional  storaoe  must  somehow 
be  allocated  to  tne  file.  The  linear  hasnino  increases  tr.e 
storage  space  gradually  by  splitting  the  primary  buckets 
in  an  orderly  fashion:  first  bucket  0,  then  oucket  i, 
etc,  A  pointer  p  keens  track  of  wplch  bucket  is  tne  next 
to  be  split. 

In  general  terms,  to  implement  the  linear  virtual 
hashing,  starting  from  a  file  of  n  buckets,  we  need  a 
ranoe  condition  (a  sequence  of  hashing  functions)  h(0), 
MCI),  H(2},.,. 


where 


H(l)  K  Is  an  alement  of  <0,1,2.,,N2  to  tna  i), 

1  s  0»l#2tt* 

and  (for  the  split  condition),  where  for  eacn  Key  k, 

either 

(1)  H(l+1)  K  S  H(l)  K  or 

(2)  H(l+l)  K  a  H(l)  K  ♦  M2  to  the  1  ,  i  a  0,1,2,... 

The  hashinq  functions  are  assumed  to  hash 

randomly  and  hence  event  (1)  and  event  C2)  are  equally 

llKely,  An  example  of  such  a  sequence  Is  tne 

division/remainder  technique  use  above  where  nti)  K  s  k  mon 
^2  to  the  i. 

In  order  to  compute  the  address  of  a  record  at  any 
time  we  must  Keep  tracK  of  the  state  of  the  file.  This 
can  be  done  by  two  variables,  let's  say  n  and  a,  rne 
variable  p  denotes  the  level  of  the  file  and  counts  tne 
number  of  times  the  size  of  the  file  nas  doubled,  while 
the  variable  q  is  a  pointer  to  the  next  bucket  to  n*  sollt. 
A  simple  alqorlthm  (see  Figure  37)  exists  to  compute  noth 
the  address  h  of  a  record  with  key  K  and  the  sDllttmo  of 
the  bucket. 

Retrieval  of  a  record  is  simple.  First,  h(0)  k  is 
computed  and  If  H(0)  K  >  h  the  bucket  has  not  yet  keen 
SDllt  and  H(0)  K  Is  the  desired  address.  If  h(0)  k  s  u 
(l.e.,  a  bucket  has  been  sollt),  then  H(l)  k  is  computed  ann 
gives  the  address.  Finally,  a  method  is  required  to 
establish  rules  for  deciding  when  solittinq  of  the  next 
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Figure  37.  Linear  Hashing  Algorithm 


bucicet  Is  to  taice  place,  called  the  control  function,  of 
the  several  alternatives  available,  the  foilowlno  two 
strategies  are  noted,  called  uncontrolled  and  controlled 
splitting. 

Uncontrolled  means  that  a  bucicet  is  split  whenever 
an  inserted  record  Is  placed  In  the  overflow  area,  mis 
rule  leads  to  low  storage  utilization,  but  good  retrieval 
performance.  If  we  want  to  achieve  nioner  storage 
utilization,  the  control  function  must  he  more  restrictive. 

Controlled  splitting  allows  splitting  ot  a  buc<et  to 
take  Place  only  when  an  inserted  record  is  Places  in  the 
overflow  area  and  the  overall  storaoe  utilization  is  above 
a  certain  pre-determined  threshold,  Tnis  rule 
obviously  leads  to  better  storage  utilization  nut 
slower  retrieval  tPef,  44J, 

Figure  38  Illustrates  how  linear  hashing  works, 

2.  XbA  Qvaaaic  ua&biac 

The  dynamic  nasnlnq  scheme  Is  based  on  normal 
hashing  exceot  that  the  allocated  storage  space  can  easily 
oe  Increased  and  decreased  witnout  reorganizing  the  file, 
according  to  the  number  of  records  actually  stored  In  tne 
file.  The  expected  storaoe  utilization  is  aporoxlmatelv 
6fl%  at  eacn  time,  and  there  Is  no  overflow  records. 
The  price  which  must  be  paid  for  this  Is  the 
maintenance  of  a  relatively  small  index.  If  this 
index  is  available  In  the  main  storage,  only  one  access  to 
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secondary  storage  Is  necessary  when  searcMno  for  a  record. 
The  file  organization  for  this  scheme  employs  a  data 
structure  consisting  of  a  data  tile  In  which  tne  data 
records  are  stored,  and  an  index  to  the  data  file.  The 
Index  Is  organized  as  a  forest  of  binary  trees.  rne 
hash  trees  used  here  are  closely  related  to  binary  tries. 
The  data  file  consists  of  a  variable  number  of  buc<ets  of 
fixed  size.  Tne  set  of  records  to  be  stored  at  a  certain 
time  is  denoted  by  rt,  where  1  a  1.2....n.  The  numner  cf 
records  n  is  not  fixed  but  may  vary  with  time.  a  record  1 
Is  assumed  to  contain  a  unique  »fev  »fl.  The  set  of  tceys  is 
denoted  by  Kl.  where  1  a  1.2....n.  Fach  bucxet  In  the 
data  file  has  a  capacity  of  b  records. 

The  file  is  Initialized  in  much  the  same  wav  as  a 
normal  hash  file.  The  secondary  storage  space  is  allocated 
for  H  bucxets.  In  the  Index  h  entries  are  Initialized,  one 
entry  for  eacn  bucxet,  each  entry  containing  a  pointer  to 
a  bucicet  In  the  data  file.  Each  index  entry  is  either  an 
internal  node,  which  contains  pointers  to  Its  father  ani 
sons,  or  an  external  node,  which  contains,  besides  tne 
pointer  to  Its  father,  a  pointer  to  a  bucket  in  the  data 
file  and  the  number  of  records  actually  stored  In  this 
bucket.  The  Initial  buckets  are  said  to  be  on  tne  level 
zero.  A  hashing  function  Ho  for  dlstrioutlng  the 
records  among  the  buckets  Is  also  needed.  The  value 
Ho(Kl)  In  this  case  is  used  to  define  an  entry  point  in  tne 


Index  and  does  not  refer  directly  to  a  bucicet.  The  bucxet 
is  found  by  means  of  the  pointer  In  tne  corresnonrtina 
entry. 

When  the  file  is  properly  Initialized  we  can  start 
loading  the  file.  This  is  also  done  in  approxlnately  tne 
same  way  as  for  a  normal  hash  file,  but  using  the  In'^ex 
to  locate  the  bucKets,  Sooner  or  later  a  bucket  win 
overflow,  l.e.  when  trying  to  insert  a  record  in  a 
bucket  that  is  already  full.  when  this  napoens  tne 
bucket  is  split  into  two.  The  storage  space  for  a  new 
bucket  Is  allocated  and  the  records  are  distributed  equally 
among  the  two  buckets.  At  the  same  time  the  Index  is 
uodated  to  depict  the  new  situation.  Additional  records  that 
would  be  stored  in  the  split  bucket  are  distributed  between 
the  two  buckets.  If  later,  one  or  the  other  of  tne  t*o 
buckets  become  full,  this  In  turn  is  split  Into  two 
Puckets.  Figure  39  Illustrates  the  structure  of  a  dynamic 
file  after  three  splits.  The  levels  represent  the  nasnini 
table  or  directory,  the  circles  represent  tne  branch  nooes, 
the  squares  represent  tne  leaf  nodes,  end  tne  arrows 
represent  the  the  record  addresses. 

When  the  numoer  of  stored  records  decreases,  tne 
allocated  space  can  also  oe  decreased,  .4hen  the  number  of 
records  in  two  brother  buckets  becomes  less  tnan  or  equal 
to  the  capacity  of  one  bucket,  the  two  brother  buckets  are 
merged  into  one.  and  one  bucket  can  be  freed.  Two  buckets 


are  brothers  If  the  corresponding  external  nodes  nave  the 
same  father  node.  At  the  same  time  the  corresponding  search 
tree  is  updated. 

Some  of  the  most  Important  eharacterlst J cs  of  tnts 
hashing  scheme  include  that  the  allocated  physical 
storage  space  is  easily  increased  or  decreased  as 
required  hy  the  actual  number  of  records  stored.  There 
Is  no  overflow  prenlem,  since  overflow  records  do  not 
occur.  Retrieval  is  fast.  The  retrieval  of  a  record 
requires  only  one  access  to  the  secondary  storage^  provideo 
that  the  forest  of  hasning  trees  Is  in  main  storage.  Also, 
it  Is  a  simple  cask  to  find,  insert  or  delete  a  record 
with  xey  K,  or  to  establish  the  fact  that  a  certain 
record  is  not  in  the  file. 

Simple  algorithms  exist  to  accomplish  this. 
Although  the  tree  structure  for  this  scheme  is  simple, 
it  does  lead  to  additional  storage  requirements.  several 
variants  of  dynamic  hashing  have  been  proposea  to 
compensate  for  this  drawbacx.  The  first  variant  is  the 
dynamic  hashing  with  the  deferred  splitting,  in  wntcn  tne 
sbllttlng  is  deferred  until  both  the  bucket  Itself  and 
Its  brother  are  full.  In  other  words.  If  the  "nome"  bucker 
Is  full,  tne  record  Is  attempted  to  be  stored  in  Its  nrotner 
bucket.  If  this  Is  full  as  as  well,  or  has  already  been 
split,  the  "home"  bucket  is  split.  This  modification 
leads  to  better  storage  utilization  but  requires  more 


coniDllcated  (and  therefore  slower)  algorithms  for  searchlnq, 
Insertion  and  deletion.  Also,  searching  is  slower,  since  it 


may  be  necessary  to  search  two  buckets.  Furthermore  it  is 
evident  from  experiments  (Ref,  45]  that  aeterre-i 
SDllttlna  leads  to  more  unstaoie  storage  utilization. 
Thus. there  Is  the  trade-off  oetween  fast  retrieval  and  nlan 
storage  utilization. 

Another  variant  to  dynamic  hashino  is  tne  linear 
splitting.  In  which  the  file  supports  a  laroer  numoer  of 
records  before  the  Index  overflows  on  seconoarv 
storage,  provided  that  tne  index  is  available  in  tne  main 
storage,  experiments  show  that  linear  splitting  terforps 
just  as  well  as  tne  deferred  splitting.  The  advantage  over 
deferred  splitting  Is  tnat  the  index  node  size  is  smaller. 
Also,  as  far  as  the  number  of  accesses  to  the 
secondary  storage  Is  concerned.  this  scheme  is  rather 
unfair,  since  some  records  are  accessed  very  fast,  •Mile 
other  records  may  require  a  laroe  number  of  accesses  Ceef, 
45],  The  linear  hashing  evolved  from  this  second  scheme. 


3.  Ihc  LiaaaK 
The  linear 
dynamic  hashing 
linear  hashing  and 
case.  The  main 
retalnedi  the  file 


yas&laa  isltb  Sastial  ezaaAsXaa& 
hashing  with  partial  expansions  Is  a 
scheme  that  Is  a  generailzatlori  of  the 
contains  linear  hasnlnq  as  a  special 
advantages  of  the  linear  hashing  are 
size  grows  and  shrinks  gracefully,  there 
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is  no  Index  and  the  maintenance  and  retrieval  algorithms 
are  slmpiet  However#  a  significantly  better  search 
performance  can  be  achieved.  Using  this  technloue  a  dynamic 
filer  xlth  a  constant  storage  utilization  of  85  %,  for 
example,  can  be  designed,  where  retrieval  of  a  record 

requires  an  average  of  only  1.05  accesses  to  seconjarv 

storage. 

This  above  scheme  Is  based  on  tiie  observation  that 
an  Important  characteristic  of  hasnlng  techniques  is  tnat 
the  best  performance  is  achieved  when  the  records  are 
distributed  as  uniformly  as  posslole  over  all  the 
buckets  In  the  file.  The  record  dlstrlbation  of  the 
linear  hashing  deviates  quite  radically  from  this  ideal. 
The  load  factor  of  a  bucket  already  split  Is  only  naif 

the  load  factor  of  a  bucket  not  yet  split.  If  a  more  even 

load  could  somenow  be  achieved,  the  performance  of  the  tile 
would  be  considerably  Improved, 

the  main  difference  compared  to  the  linear 
hashing  Is  that  the  doubling  of  tl.e  file  size  is  aone 
In  a  series  of  partial  expansions.  If  tnls  is  done  ir 
two  steps,  the  first  expansion  Increases  the  file  size  to 
1,5  times  the  original  size,  wnlle  the  secona  exoanslon 
Increases  It  to  twice  tne  original  size  (see  Kigure 
Note  that  when  examining  a  group  of  buckets.  It  is  not 
necessary  to  rearrange  records  among  the  old  buckets,  Tt 
does  suffice  to  scan  through  the  old  ouckets  collecting 


only  those  records  which  are  to  be  re-allocated  to  the 
new  bucket.  This  makes  It  possible  to  '^a<e  the 
expansion  In  one  scan,  and  no  jumping  back  and  forth  is 
necessary. 

The  performance  will  he  cyclical  where  a  eyOe 
corresponds  to  a  full  expansion.  The  rate  of  expansion  is 
governed  by  a  control  function,  which  Is  slmpiy  a  set 
rules  for  determining  when  the  next  expansion  is  to  tan-e 
Place  TRef ,  461  , 

The  performance  measures  are  the  same  as  tne  dynamic 
hashing;  length  of  successful  and  unsuccessful  searches, 
cost  of  Inserting  a  record  ,  and  cost  of  deleting  a 
record.  The  cost  measure  is  the  same  for  ail  operations; 
tne  expected  number  of  accesses  to  secondary  storaoe 
reaulred  to  carry  out  the  operation  m  ouestion. 
Simple  algorithms  exist  that  compute  expected  overflow  soacw 
regulrements ,  performance  measures  for  the  expected  value 
at  any  point  of  a  partial  expansion  and  the  average  of  tne 
expected  values  over  a  full  expansion.  Also,  co^trarv  to 
what  Is  often  believed,  the  analysis  reveals  tnat  tne 
longest  probe  sequence  is  not  expected  to  be  very  long  for 
normal  parameter  combinations. 


The  above  control  function  Is  optimal 


In 


a 


certain  sense,  Everythlnq  else  being  equal,  there  is  a 
trade-off  between  storage  utilization  and  the  expected 
length  of  successful  searches.  The  higher  storage 

utilization  is,  the  longer  are  the  searches  exnected  to 
be.  Any  one  of  the  two  factors  can  be  controlled,  out  not 
both  simultaneously.  One  proposal  Is  that  storane 

utilization  is  controlled  by  requiring  that  it  snoiii-i 
always  be  hianer  than,  or  equal  to,  some  threshold, 
once  the  threshold  has  been  fixed,  the  expected  length  of 
successful  searches  will  be  minimized  by  always  xeeplno 
storage  utilization  as  low  as  bosslhle.  The  rule  anove 
allows  storage  utilization  to  go  slightly  oelow  tr<e 
threshold,  simply  because  the  storage  utilization  after  an 
expansion  Is  not  known  oefore  the  expansion  has  oeen  made. 
However,  when  the  number  of  buckets  in  the  tile  is 
moderately  large,  this  rule  will  result  in  a  storaue 
utilization  which,  for  all  practical  purposes,  is  constant 
and  eaual  to  the  threshold  [Fef,  46], 

4,  Xba  latasBolatloa  ttasbXas 

The  Interpolation  hashing  scheme  is  an  adaptation  of 
the  linear  hashing.  This  scheme  supports  tne  desired 
operations  Insert,  delete,  update  and  range  query,  a$ 
before,  the  hash  functions  win  map  records  to  chains, 
Each  chain  will  be  associated  with  a  region;  however. 


200 


#  •  ••  _m  m  '  9  ,  «  ,  ^ 


the 


association  of  chains  and  regions  will  vary  as  records  are 
inserted  and/or  deleted  from  the  file.  The  ensemble  of 
Chains  partitions  the  Key  space  Into  disjoint  regions  of  the 
eoual  volume. 

The  conditions  of  hash  functions  are  to  caot-ire 
the  order  preserving  nature  of  the  scheme.  The  functions  of 
the  sequence  H  (a  no  ,hl  .h2 . . . . )  are  spilt  functions 
tor  ho  provided  the  followlno  ranoe  anq  split 
conditions  are  satisfied  for  all  i: 

The  range  condition  of  M  is:  K  -  {  o,i,2,,,2 

to  the  1-1) 

The  operations  Insert.  delete,  and  update  each 
concern  a  single  record  at  most.  In  view  of  the  above 
remarks  in  reference  to  h,  the  ooeratlop  are  identical  to 
the  equivalent  operation  in  the  linear  hasnlng.  Tne 
implementation  of  the  range  query  operation  is  unique 
to  this  scneme.  The  notion  of  ranoe  guery  includes  botn 
the  exact  match  query  and  partial  match  query  as  special 
cases.  That  Is  to  say,  an  exact  match  ouery  is  a  range  query 
for  which  u  s  v,  where  u  and  v  are  a  pair  of  points 
representlnq  the  ranoe  query,  a  partial  match  query  is  a 
ranoe  query  where  for  some  components  u(J)  =  o  an-* 
v(j)  a  1  while  for  the  remaining  components  u(j)  a 
v(j).  Now,  as  expected,  the  set  of  records  corresponding 
to  the  pair  (u,v)  can  change  with  Insert,  delete,  and 
update  operations.  The  set  of  chains  associated  with  the 


pair  (u,v)  however  will  not  chanqe  CRef,  47],  The 
Interoolation  hashing  handles  the  range  query  efficiently  tv 
ordering  the  tcey  attribute  values.  However,  range  luerv 
cannot  be  answered  efficiently  wltn  a  file  structure  tnat 
has  keys  scattered  all  over  the  buckets.  Otherwise,  tnis 
scheme  does  preserve  the  order  of  keys.  Access  ani  split 
alqorlthms  are  exactly  the  same  as  the  linear  hasnlna,  only 
the  hash  function  has  been  changed  [Kef.  42]. 

5,  Xbe  Extaadible 

The  extendible  hashing  Is  a  fast  access  method  for 
dynamic  files,  with  this  technique,  the  user  Is  guarantee-^ 
no  more  than  two  page  or  bucket  faults  to  locate  the  data 
associated  with  a  given  unloue  Identifier,  or  key,  unlike 
other  hashlnq  schemes,  the  extendible  hashlno  has  a  dyna-nic 
structure  tnat  orows  and  shrinks  as  the  database  grows  an  i 
Shrinks.  This  aoproacn  simultaneously  solves  tbe  nrooiem 
of  maklno  hash  tables  that  are  extendible  and  of  naklm 
sadlx  seascb  t&aas  that  are  balanced.  Radix  search  trees, 
also  known  as  digital  search  trees,  or  tries,  wnicn 
examine  a  key  one  digit  or  letter  at  a  time,  nave  lori 
ueen  known  to  provide  Potentially  faster  access  than  tree 
search  schemes  tnat  are  based  on  combarlsons  of  entire 
keys,  for  the  slmoie  reason  that  one  comparison  leads  to  a 
laraer  fan«out  (equal  to  the  number  of  characteristics  in 
the  alphabet  underlying  the  key  space).  In  practice. 


however#  radix  search  trees  tend  to  be  used  only  for  small 


files# 

since 

they 

often 

waste  memory. 

usually. 

this 

wasted 

memory  occurs 

at  the 

nodes  near  the 

bottom  of 

the 

tree# 

since 

a  trie 

normally 

contains  space 

for  many 

Keys 

not  in  the  table#  because  the  scheme  of  aliocaClnq  a 
field  for  each  character  of  the  alphaoet  at  each  none  Is 
better  suited  to  representing  the  entire  icey  space  rather 
tnan  the  contents  of  a  particular  file. 

The  most  Important  performance  characteristic  of  the 
extendible  hashing  is  Its  speed,  tven  for  tiles  that  are 
very  laroe  by  current  standards#  there  are  never  more 


than  two 

page  faults 

necessary 

to 

locate  a  key  and  its 

associated 

information. 

In  order 

to 

utilize  this  sche"'e. 

the  file  is  structured  into  two  levels  (see  rioure  41): 
directory  and  leaves.  The  leaves  contain  pairs  (K,i(k)), 
where  k  is  a  Key#  and  1(K)  is  associated  information, 
wnlch  Is  either  the  record  associates  with  K#  or  a  pointer 
to  the  record,  Tne  directory  nas  a  header,  In  whlcn  Is 
snored  a  quantity  called  the  deoth#  d,  of  tne  directory. 
After  the  header#  the  directory  contains  bolnters  to  leaf 
panes  or  buckets.  The  pointers  are  laid  out  as  follows. 
First#  there  is  a  pointer  to  a  leaf  that  stores  all  Keys 
X  for  Which  the  pseudokey  Kl  a  h(h)  starts  with  d 
consecutive  zeros.  This  is  followed  by  a  pointer  for  an 
keys  whose  pseudokeys  beoin  with  the  d  bits  0,..oi#  and  then 
a  pointer  for  all  keys  whose  pseudokevs  beqln  0  ,.,Oto, 


and  so  on^  lexicographically.  Thus,  altogether  thare  are 
2  to  the  d  pointers  (not  necessarily  distinct),  and  the 
final  pointer  Is  for  all  keys  whose  pseudokey  begins  with  d 
consecutive  ones.  The  depth  ot  the  directory  is  the 
maximum  of  the  local  depths  of  all  of  toe  leaf  olocks. 

In  the  situation  wnere  there  Is  a  sinaie  leaf 
block,  with  the  local  depth  1,  which  finally  overtlils, 
or  reaches  a  predetermined  unacceptably  full  level,  such 
as  90  %  full,  It  would  split  Into  two  leaf  pages,  eacn 
with  local  depth  2.  On  the  otner  hand ,  If  a  leaf  hioc< 
overfills,  and  the  local  depth  of  the  leaf  block  already 
equals  the  deoth  of  the  directory,  then  as  the  directory 
doubles  In  size.  Its  depth  Increases  by  1,  and  the  leaf 
page  splits.  This  process  of  doubling  the  directory  is  not 
expensive  because  no  leaf  blocks  need  to  oe  touched  f 
except,  of  course,  for  the  leaf  block  that  caused  the 
split  and  its  new  sibling).  For  example,  if  there  are  a 
few  million  keys  when  the  directory  doubles,  ana  If  trie 
secondary  storage  device  has  a  data  transfer  rate  of  around 
a  million  bytes  per  second,  then  It's  stralontf orward  to 
estimate  that  the  time  Involved  in  doubling  the  directory 
(which  Is  mainly  data  transfer  time)  would  be  less  man  a 
second  If  there  were  400  keys  oer  leaf  block,  Fven  in  the 
extreme  case  of  a  billion  keys,  tne  time  involved  m 
doubling  the  directory  would  be  less  than  a  minute,  a 
number  of  advantages  accrue  from  the  simple.  Intuitive 


structure  of  the  extendible  hashing.  The  rnost  oovlous  Is 
the  simplicity  of  the  coding,  thus  leading  to  lower 
likelihood  of  bugs.  Also,  the  extendible  hasnlng 
algorithm  is  easily  modified  to  accommodate  indivioual 
needs  [Ref,  48],  Moreover,  its  operating  costs,  as  analyzed 
by  Mendelson  [Ref,  49]  are  fairly  low,  Furttiermore,  tnere 
is  an  easy,  essentially  one^oass  algorithm  tor  aoubllna 
tne  directory,  that  proceeds  by  working  from  tne  bottom  of 
the  old  directory  to  the  top  of  tne  old  directory. 

This  technlgue  provides  an  attractive  alternative  to 
other  access  metnods, 

6.  x&a  Caalascad  Uaaaiao 

The  term  coalescing  refers  to  the  phenomenon  In 
which  a  record  Ri  collides  with  another  record  R2  that 
previously  collided  elsewhere,  and  Rl  Is  linked  Into  R?'s 
Chain  even  though  tne  two  records  nave  different  nasn 
addresses , 

The  coalesced  hashing  is  a  very  efficient  technique 
for  storing  and  retrieving  information  dynamically,  Thn 
algorithm  combining  storage  and  retrieval  works  as  follows: 
Given  a  record  with  the  key  K,  the  algorithm  searches  for 
it  in  the  hash  table,  starting  at  its  hash  address  nash  rK) 
and  following  the  the  links  in  the  chain.  If  the  record 
is  found,  the  search  is  successful:  otherwise,  tne  end 
of  the  Chain  is  reached  and  tne  search  is  unsuccessful  In 
which  case  the  record  is  inserted  as  follows:  if  the 


position  of  hash  (K)  Is  empty,  then  the  record  id  stored  at 
that  location;  otherwise,  the  record  Is  stored  in  the 
largest-numbered  empty  slot  In  the  table  and  linKed  into 
the  chain  that  contains  slot  hash  (K)  (l,e.,  at  some  oolnt 
In  the  chain  after  slot  hash  (K)). 

mere  are  several  different  wavs  to  iinic  that 
record  Into  the  cnaln.  The  conventional  method  lln<s  tna 
record  to  the  end  of  tne  chain  that  contains  the  slot  naso 
(K),  Another  method  Is  to  Insert  tne  record  into  trie 
Chain  immediately  after  slot  hash  (K)  by  rerontlnci 
pointers.  This  method  Is  called  easlu-iasastiaa  caala&cei 
baabi&a  or  FICH,  because  the  record  is  inserted  earlv 
Into  the  chain;  whereas  the  conventional  method  is  referred 
to  as  "late-lnsertlon  coalesced  hashing"  or  LlCh,  For  tne 
standard  coalesced  hasnlng,  l.e,,  when  there  is  no  cellar 
(8  cellar  refers  to  the  extra  space  reserved  for  storlna 
colliders),  these  methods  are  abbreviated  KISCH  and  uisCn, 
Insertion  can  be  done  faster  with  the  early-  Insertion 
method  when  It  is  Xnown  a  priori  that  the  recoro  Is  not 


already  present  In  the  table,  since  It  Is  not  necessary  to 
search  the  end  of  the  chain. 

Search  times  can  be  Improved  oy  devoting  the 
bottom  portion  if  the  table  has  a  cellar.  In  whlcn  only 
colliding  records  can  be  storeo.  Colliders  that  are 
stored  In  the  cellar  are  thus  protected  from  being 
collided  Into  by  records  Inserted  later.  Coalescing 
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cannot  occur  until  the  cellar  becomes  full  and 
colliders  begin  to  be  stored  In  the  address  region. 

As  far  as  performance,  without  a  cellar  tIscH  has 
faster  search  oerformance  than  LISCH,  out  when  the 
early Insertion  coalesced  nashlno  method  Is  used  wlto 
a  cellar,  the  average  search  performance  is  inferior 
to  that  of  LICH,  since  In  EICH  the  records  of  a  chain 
that  are  In  the  cellar  come  at  the  end  of  the  chain. 
Whereas  In  LlCh  they  come  Immediately  after  the  first 
record  In  the  chain, 

A  new  variant  of  coalesced  hashing  is  called  uaci&ii 
XasesliOB,  which  combines  the  advantaoes  of  early 
Insertion  and  late  Insertion.  This  method  (VlCH)  ts 
silahtly  different  from  EICH,  In  that  the  collider  is 
Inserted  Immediately  after  Its  hash  address,  as  in  pich, 
except  when  the  cellar  is  full#  when  there  is  at  least 
one  cellar  slot  In  the  chain,  end  when  the  hash  address  of 
the  collider  Is  the  location  of  the  first  record  in  tne 
Chain,  In  that  case,  the  collider  Is  llnxed  Into  the 
Chain  Immediately  after  the  last  cellar  slot  In  the  chain, 
for  the  case  of  the  standard  coalesced  hashing,  the  t*n 
methods  are  identical;  that  Is,  the  varied-insertion 
standard  coalesced  hasnlng  (VISCH)  is  the  same  as  early 
insertlon  standard  coalesced  hashing  (EISCH). 
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The  varied  Insertion  is  superior  to  botn  early 
Insertion  and  late  insertion  with  respect  to  search-tlwe. 


VTCH  is  search  time  optimum  among  all  direct  cnalnlnu 
methods,  under  the  assumptions  that  the  records  are  not 
moved  once  they  are  Inserted,  that  for  each  cnam  me 
relative  order  of  Its  records  does  not  cnanae  after 
further  insertions,  that  tnere  Is  only  one  llnic  field  ner 
table  slot,  and  that  cellar  slots  oet  priority  on  mn 
aval lable>slots  list.  It  remains  an  ooen  protie-ii  /*netner 
there  are  methods  with  faster  search  times  than  vtcn  if 
we  remove  the  last  assumption  rf>ef.  So], 

7 ,  &  Suaaasv 

file  organizations  based  on  hashlno  are  suitable 
for  data  whose  volume  may  vary  rapidly  ano  for  rapid  data 
accesses  at  the  expenses  of  lower  loadlnq  factors,  rn  the 
different  variants  of  the  hashing  previously  dlscussen,  tr.e 
rehashing  Is  avoided,  Tney  do  not  require  apy  tnoro'ion 
reorqanlzat lor  of  tne  file;  Further,  the  storage  soace  Is 
dvnatTilcally  adjusted  to  the  number  of  records  oeinj 
stored  and  there  are  no  overflow  records.  Some  ot  tne 
tecnnioues  employ  an  Index  to  the  data  flies;  others  no 
not.  The  retrieval  is  fast;  the  storage  utilization  is  ic*. 

In  order  to  Increase  storage  utilization,  new 
schemes  nave  been  discussed.  In  these  schemes,  overflow 
records  arc  accepted,  and  the  price  which  had  to  be  paid  for 
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the  Improvement  on  storage  utilization  is  a  slight 
access  cost  degradation. 

Dynamic  hashing  schemes  and  extendible  hashing 
schemes  employ  an  index  to  the  data  file;  tnereforer  once 
the  bucket's  address  nas  been  found,  the  retrieval  is  fast. 
The  extendible  hashing  is  implemented  by  -neans  of 
partitioning,  in  contrast,  dynamic  nashlng  scnemes  are 
iPDlemented  by  means  of  a  tree  structure,  i»hlcm  orov*s  and 
shrinks  more  smoothly,  but  the  index  node  size  is  larger 
than  that  of  extendible  hashing's  Index  entry. 

The  linear  hashing  schemes  are  similar  to  tne 
extendible  hashing  out  do  not  employ  any  Index.  The 
retrieval  of  a  record  tnen  regulre  only  one  access  to 
secondary  storage.  The  price  to  be  paid  for  tnis  is  a  verv 
low  storage  utilization,  compared  to  the  other  schemes. 

Coalesced  hashing  schemes  have  been  snown  to  ne 
very  fast  for  the  dynamic  information  storage  and 
retrieval.  Its  parameters  relate  tne  sizes  oe  tnn 
address  region  and  the  cellar.  Tecnnlgues  discussed  are 
gesiqned  to  tune  the  parameter  in  order  to  achieve  oDtlmup 
search  times,  hut  do  have  some  open  problems  (iRef.  joi, 

B.  THE  EMPLOYMENiT  OF  IMDICES  FOP  PRECISE  ACCESS 

An  Index  is  a  file  In  which  each  entry  (record) 
consists  of  a  data  vaiua  together  with  one  or  more 
pointers.  The  data  value  Is  a  value  for  some  field  of  some 


record  or  records  in  the  indexed  file. 


and  the  pointers 


identify  records  in  the  indexed  file  havinq  the  value  for 
the  field  in  the  record(s).  Thus,  there  are  two  types  of 
files:  The  index  file  and  the  indexed  file,  i,e,,  the  record 
file.  There  are  also  two  types  of  records.  The  record  In  the 
index  or  index  file  Is  an  index  entry  ano  the  record  in  tne 
record  or  indexed  file  is  tne  data  agoreoate  of  fields,  Tnr» 
fields  whose  values  are  kept  in  the  index  are  referred  to  as 
key  fields  of  the  records.  The  Indexed  access  pertains  to 
using  an  index  of  key  fields  that  provide  the  alsk  addresses 
of  records  stored  in  a  file.  Generally  speaking,  tne 
contents  of  the  index  file  are  an  abstraction  of  tne  file  ot 
basic  source  documents,  i.e,,  it  can  be  considered  as  a 
sort  of  Shorthand  substitution  for  the  original  document, 
containing  only  as  such  information  as  essential  attrinutes 
and  statistics  required  to  satisfy  the  user's  need,  in 
addition,  tne  advantage  of  indexing  is  that,  although  it 
does  not  access  as  fast  as  hashing  schemes.  It  does  have  3 
verv  high  loading  factor. 

It  is  possible  to  -construct  two  types  of  Indexes; 
BOBdAO&e  and  oulBliaital  indexes.  The  idea  behind  nnndense 
index  (see  Flaure  42)  is  that  the  file  being  indexed  is 
divided  into  groups,  with  several  stored  record  nccurrenees 
in  each  group,  such  that  for  any  two  groups,  all  the  stored 
record  occurrences  in  one  precede  all  those  in  the  other, 
with  respect  to  Che  sequencing  being  imnosed  on  the  file. 


The  term  nondense  reters  to  the  fact  that  the  index  does 
not  contain  an  entry  for  every  stored  record  occurrence  in 
the  indexed  file#  Thus,  the  stored  record  occurrences  nust 
contain  the  Indexed  field  [Ref*  51] * 

Multilevel  indexing  (see  Figure  43),  on  the  other  nano, 
refers  to  the  construction  of  an  index  to  the  index,  nere, 
the  Indexed  file  is  divided  into  groups  of  one  tracK  earn. 
The  tracK  index  contains  an  entry  for  each  such  tracic,  Tne 


tracte  index  in  turn  is  divided  into  groups,  each  of  *hicri 
consists  of  the  entries  for  all  traces  of  one  cylinder  in 
the  indexed  file,  and  a  cylinder  index  contains  an 
entry  for  each  such  group  in  the  trac>c  index,  £acn  aroun 
within  the  track  index  is  normally  recorded  at  tn<» 
beginning  of  the  appropriate  cylinder  of  the  indexed  file. 


to  cut  down  on  seek  activity.  In  general,  a  multilevel 
index  can  contain  any  numoer  of  levels,  each  of  wnien  acts 
as  a  nondense  index  to  the  level  below.  An  index,  no*,  can 
be  used  in  two  ways.  First,  it  can  oe  used  for  the 
seguentlal  access  to  the  indexed  file,  in  accordance  to 
the  values  of  the  indexed  field.  In  other  woros,  it  can 


Impose  an  ordering  on  that  indexed  file.  Second,  it  can  o* 
used  for  direct  access  to  individual  records  in  tne  indexea 
file  on  the  basis  of  the  value  for  the  same  key  field,  ntner 
file  organizations  and  other  index  technlgues  that  are 
are  presented  are  the  heap,  direct,  primary  and  seconday 
keys,  B  and  B^  trees,  clustering,  and  directory  hierarchy. 


1.  Zb«  Maap  Elia  Qsaa&izaZiaa 


The  most  obvious  and  basic  approach  to  storing  a 
file  of  records  is  simply  to  list  them  in  as  many 
blocks  as  they  require,  although  one  does  not  generally 
ano4  records  to  overlap  block  bounaarles.  This 
organization  Is  sometimes  called  a  Paaa,  •vnen  It  i$ 
necessary  to  dignify  it  with  a  name.  The  blocks  used  for  a 
heap  may  be  linked  by  pointers;  or,  a  tanie  ot  their 
addresses  may  oe  stored  elsewhere,  perhaps  on  one  or  more 
additional  blocks.  To  insert  a  record,  the  record  is  placed 
in  the  last  block  if  there  is  space,  or  in  a  ne*  bioc*.  it 
there  is  no  more  soace.  Deletions  can  be  Performed  by 
setting  a  deletion  bit  in  the  record  to  be  deleted.  Reusing 
the  space  of  to-he  deleted  records  oy  storing  newly 
inserted  records  in  their  space  is  dangerous  if  pointers  to 
these  records  still  exist. 

Given  a  key  value,  the  record  lookup  requires  a  scan 
of  the  entire  neap-organized  file,  or  at  least  naif  tne 
file  on  the  average,  until  the  desired  record  is  foun.-j. 
It  is  this  operation  whose  cost  Is  prohibitive  if  tno 
file  In  question  is  spread  over  more  than  a  few  blocks 
iPef.  S21, 

Note  that  the  data  file  has  no  particular  order. 
All  records  are  stored  randomly  in  the  file,  Insertlm  a 
new  record  is  simple;  deleting  a  record  requires  a  lot  ot 


oata  movement 


In  short,  the  hean  structure  Is  beneficial  If  the 


records  are  small,  ani  the  file  Is  temporary. 

2.  Ibe  Xada^ted  &aauea£ial 

The  development  of  direct-access  storaoe  devices 
made  It  feasible  to  transform  a  seouential  file  Into  a  file 
that  could  be  accessed  both  sequentially  and  randomly  via  a 
primary  )(ey.  The  Lad&x&A  seaueaXlal  111&  aasaoXsatloc  is 
such  a  file  orqan Izatlon ,  It  is  probably  the  most  noouiar 
ano  slmoiest  tile  oroanlzatlon  for  sinqie-icey  files.  If  is 
referred  to  as  ISAH,  Indexed  sequential  access  notnon  ov 

Prior  to  dlscussinq  the  above  file,  let's  first 
review  tne  sequential  access.  The  sequential  access  pertains 
to  storlnq  ana  retrieving  recoras  In  a  one-after-tne- 
other  order.  Records  are  aenerally  stored  in  ascendim  or 
aescendlnq  order  by  a  record  xey,  a  record  xey  is  a 
unique  unchannlna  niece  of  information  such  as  an 
account  number,  name,  or  social  security  numner,  Tne 
seouentla]  access  Is  tne  only  access  technique  usea  .itn 
maanetic  taue  drive,  which  are,  by  Its  deslnn,  sea'iertial 
access  devices. 

The  storage  and  retrieval  of  records  In  a  sequentJa] 
order  Is  similar  to  tne  approach  used  In  manual  systems. 
Accordingly,  the  sequential  access  has  traditi end i  ly 
appealed  to  organizations  converting  from  manual  to 
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computer-based  systems.  This  appeals  combined  with  the 
early  dominance  of  magnetic  tape  as  a  cost-effective  storage 
medium  has  led  to  the  use  of  sequential-access  flies  in 
most  Initial  computing  efforts. 

The  sequential  access  Is  used  primarily  In  batcn- 
processlng  environments.  It  is  oartlcularly  effective  wnen 
the  transaction  activities  are  evenly  distributed  among  tne 
records  in  the  file,  when  a  file  Is  to  ne  uboateo,  rne 
transactions  are  sorted  Into  the  sane  sequence  of  recoros 
required  by  the  transactions.  The  sorted  transactions  are 
then  sequentially  matched  against  the  file  TRef,  b3I, 

In  the  sequential  data  file,  records  are  stored  in 
the  order  of  primary  key  attribute,  not  necessarily 
Physically  contiguous,  l,e,,  It  could  be  a  linked  list.  The 
insertion  and  deletion  are  straightforward  It  the  da 
file  is  organized  as  a  linked  list.  It  is  similar  to  the 
heap  structure  in  operation,  but  Its  primary  key  can  on 
processed  more  efficiently  than  heap  organization  [Ref, 
57],  It  is  a  sequential  organization  with  two  additional 
features.  One  feature  is  an  Index  to  provide  rangom  access 
to  keyed  records,  and  the  other  feature  is  an  overflow 
area  tnat  provides  a  means  for  handling  record  additions  to 
a  file  without  copying  the  file,  IBM  refers  it  to  the  ISA* 
files.  An  ISAM  file  (see  Kigure  44)  consists  of  three 
component  areas:  an  Index  area,  a  prime  area  containing 
data  records  and  related  track  indexes,  and  an 


overflow  area* 


An  access  to  an  ISAM  file  can  oe  made  In 


either  seauentlal  mode  or  direct  mode.  When  tne  access 
mode  Is  sequential*  records  are  retrieved  in  basically 
the  same  manner  as  they  are  for  a  Keyed  seauentlal  file. 
The  sequential  accessing  can  begin  at  any  record  in  tfie 
file.  To  start  seauentlally  accessing  an  ISAM  file  at  a 
specific  record  In  tne  file*  a  user  must  specify  the  Key 
value  of  the  record.  When  the  access  mode  Is  direct,  tne 
Primary  key  value  of  the  desired  record  Is  supoilcd  ov  ^ 
user*  and  an  Index  translates  tne  key  value  Into  a 
block  address.  The  block  is  accessed  and  brought  into  tne 


main 

memory  where  It  is 

scanned  for 

the 

record 

containin'! 

the 

specified  primary 

key  value. 

An 

Index 

Is  rreatei 

automatically  by  a  file 

as  records 

are 

written 

Into  tne 

prime  area,  hecords  are  written  into  a  prime  area  in  tre 
lexical  order  determined  by  the  value  of  the  primary  key  in 
each  record.  An  Index  is  created  on  the  same  primary  key 
that  is  used  to  order  the  records  in  tne  prime  area, 

A  number  of  Index  levels  can  exist  in  the  index 
area  of  an  ISAM  file.  The  track  index  is  the  lowest  level 
of  the  Index*  Is  always  present,  and  Is  written  on  tne 
first  track  of  the  cylinder  tnat  It  Indexes,  A  track  index 
contains  two  entries  for  each  prime  track  ot  a  cyilnaer: 
a  normal  entry  and  an  overflow  entry,  a  normal  entry 
contains  two  elements: 
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(1)  the  address  of  a  prime  tracic,  and 

(2)  the  ifey  value  of  the  last  record  on  the  trac<, 

^hen  an  ISAM  file  is  created,  the  hi'jhest  <ey 
value  that  can  appear  on  a  prime  area  tracK  is  tlxed, 
and  it  is  maintained  in  the  key  value  part  of  the  related 
overflow  entry.  The  key  value  of  an  overflow  entry  c^r. 
change  only  if  the  file  Is  reorganized, 

Ihe  track  address  part  of  an  overflow  entry  Is 
initially  set  to  contain  tt-e  value  255,  and  It  Is  rh^fmed 
When  the  addition  of  a  record  to  the  home  track  causes  tn»* 
last  record  on  the  track  to  be  Placed  In  an  overflow  area. 
The  last  entry  of  each  track  Index  is  a  gummy  entry 
Inalcating  the  end  of  the  Index, 

Just  as  a  track  index  describes  the  storage  of 
records  on  the  prime  tracks  of  a  cylinder,  a  cylinder 
index  Indicates  now  records  are  distributed  over  tre 
cylinders  that  make  up  an  ISAM  file.  There  is  one  cvlinder 
index  entry  per  track  index,  that  is,  it  tne  data  records 
in  a  file  are  stored  on  20  cylinders,  there  win  he  lo 
entries  in  the  cylinder  index.  Each  cylinder  inoex  entry 
contains  the  key  value  of  the  last  record  in  tne  related 
cylinder  and  the  corresponding  cylinder  address, 

A  final  level  of  indexing  that  can  exist,  but  does 
not  have  to  exist,  in  this  hierarchical  Indexlno 
structure  is  the  master  index.  Each  entry  In  a  master 
index  contains  the  address  of  a  track  in  a  cylinder  Index 


and  the  irey  value  of  the  hlohest  iceyed  cylinder  index  entrv 
on  that  track.  The  master  Index  Is  used  wnen  the  number  of 
entries  In  a  cylinder  Index  is  large,  thus  causing  a  time 
consuming  serial  search  through  the  cylinder  Index  for  the 
correct  cylinder  containing  a  desired  record,  Tne  master 
Index  forms  the  root  node  of  the  Indexes  used  In 
files.  The  Indices  for  ISAM  flies  partition  tne  crime  area 
into  small  groups  of  records,  l.e,,  traces  of  records,  so 
that  an  Individual  record  can  be  accessed  xitnout  accessing 
all  the  records  that  precede  It.  The  oroblems  assoclaten 
with  adding  records  to  sequential  flies  are  oartiallv 
avoided  In  ISAM  files  by  the  provision  ot  an  overflov 
area.  Two  organizations  of  overflow  areas  are  possible:  a 
cylinder  overflow  area,  and  an  Independent  overflow  area. 
An  advantage  of  the  cyllnaer  overflow  area  Is  cnat 
additional  seeks  are  not  regulred  to  locate  overflow 

record,  A  disadvantage  Is  that  snace  may  ne  wasted,  if 
additions  are  not  evenly  distributed  tnrouohout  a  file,  »>n 
advantage  of  an  Independent  overflow  area  is  that  less 
soace  need  he  reserved  for  overflows,  ana  a  dlsaaventaae 
Is  that  accessing  overflow  areas  large  enough  to  contain 
the  average  number  of  overflows,  and  an  Indeoendent 
overflow  area  to  be  used  as  cylinder  overflow  area  are 
filled.  Updating  an  ISAM  file  may  affect  both  tne  orime 

area  and  the  Indexes,  For  example,  the  addition  of  a 
record  to  an  ISAM  file  may  cause  one  or  more  of  tne  key 


values  In  the  Index  entries  to  be  altered.  ISAm  flies  can 
be  updated  either  In  sequential  or  direct  model.  Tfie 
sequential  mode  should  be  used  when  uodates  can  be  oatchea. 
In  this  case  one  pass  Is  made  over  the  file.  *nen 
updates  must  be  made  on  an  Individual  basis,  they  should  oe 
done  In  direct  mode. 

when  a  record  Is  added  to  an  ISAh  file,  the  Prl^ie 
track  on  which  It  should  be  placed  is  determlneo  by  tne 
access  method  -  ISAM  in  this  case.  The  addition  Is  olares 
on  the  prime  track.  If  the  key  value  of  the  record  is  less 
than  tne  key  value  <eot  In  the  normal  entry  of  tne  relaten 
track  Index  entry.  If  tne  record  must  be  Placed  on  me 
prime  track,  then  a  record  already  on  tne  trees  mav 
nave  to  be  removed  and  placed  In  an  overflow  area.  All 
overflow  records  for  a  prime  track  are  linked  together  In 
the  overflow  area,  and  a  pointer  to  this  list  of  overflow 
records  Is  maintained  In  the  address  field  of  tne  overflow 
entry,  Tne  list  of  overflow  records.  If  anv,  for  each  tracs 
Is  maintained  In  sorted  order  on  the  primary  key.  Thus,  all 
records  associated  with  a  prime  track,  whether  they  are  on 
tne  prime  track  or  in  the  overflow  area,  are  In  logical 
sorted  order.  If  tne  key  value  of  a  recoro  to  he  added  is 
greater  than  the  key  value  In  tne  normal  entry  of  the 
related  track  Index  entry,  then  the  record  ooes  directly  to 
the  overflow  area,  Heeords  are  never  moved  from  an  overflow 
area  to  a  prime  track,  unless  a  file  Is  reorganized. 


Pi'cordf,  to  be  deleted,  on  the  other  hand/  are  not 
physically  removed  from  an  ISAM  file;  Instead,  records  to 
be  deleted  are  marked  by  a  deletion  code  *111111 lli'B, 
with  6  indicatlnq  a  binary  constant.  In  the  first  byte  of  a 
£lxed“lenath  record  or  in  the  fifth  byte  of  a  variable 
lenqth  record.  If  a  marked  record  is  forced  off  its  orlre 
track  during  a  subsequent  update.  It  Is  not  rewritten  in 
the  overflow  area  unless  It  nas  the  highest  Key  value  on 
that  cylinder.  If  a  record  that  has  the  same  key  value  as 
a  orevlously  deleted  record  Js  later  added,  the  space 
occupied  oy  the  record  to  be  deleted  mav  oe  recovereo, 
Ourlno  direct  processing,  marked  records  are  retrieved  and 
the  programmer  must  check  them  for  the  delete  cege, 

A  record  in  an  ISAM  file  can  be  modified  in  either 
sequential  or  direct  processing  modes.  It  “riav  have 
to  be  reorganized  occasionally  If  the  overflow  area 
becomes  filled  or  additions  Increase  the  time  renuireo  to 
directly  locate  records. 

Reorganization  can  be  accomplished  by  sequentlaTlv 
copying  the  records  of  a  file,  leaving  out  all  records  that 
are  marked  fir  deletions.  Into  another  storage  area  and  tnen 
r»*creatlnn  the  file  by  sequentially  copylno  tne  records  bac< 
into  the  original  file  area.  The  reorganization  an  ISAM  ttle 
with  records  in  the  overflow  area  usually  causes  new 
indexes  to  be  created  CRef,  541,  In  snort,  ISAM  i$  an  index 


file  which  contains  Keys  to  provide  faster  direct  access  to 
the  record*  Its  data  file  Is  the  same  as  sequential  file. 
Insertion  may  cause  an  overflow*  and  Its  ooeration  is 
similar  to  sequential  file,  nut  now  the  exact-match  and 
range  query  can  be  processed  faster. 

3.  Xba  Qlsect  Silas 

The  sequential  access  is  Inefficient  for  natcr- 
processlng  applications  in  which  only  a  small  proportion  of 
the  records  In  a  file  are  affected  oy  a  alven  batcn  of 
transactions*  The  entire  file  may  have  to  oe  passeo  to 
uodate  a  few  records.  For  on-line  orocesslng,  the 
sequential  access  Is  Inadequate,  The  time  lapse  of  several 
minutes  which  Is  qenerally  required  to  locate  a  record 
sequentially  Is  unacceptable.  The  sequential  access  tails 
to  taxe  the  advantaqe  of  two  exceptional  capaollltles  ol 
computer  technoloqy,  namely*  the  soeed  and  direct  access. 
The  direct  access  Is  an  alternative  to  tne  seu\jentiai 
access  that  slqnltlcantly  accelerates  the  nrocess  or 
storinq  and  retrieving  records  by  capitalizing  on  both 
the  computational  soeed  of  the  CPU  and  the  access  soeed  of 
tne  disK  drive,  disk  drives  are  capable  of  alrectlv 
accessing  any  record  in  a  file  In  a  matter  of 
milliseconds.  However*  to  access  a  particular  recoro, 
tne  dlsK  location  of  the  record  Is  required.  The  location 
Is  Indicated  by  the  address  asslqned  to  it*  which  is  saved 


When  tne  record  Is  stored  or  recalculated 


In  dl£«c£  tLlkX  there  Is  a  definite  relatlonshin 
between  the  prlmary-Key  value  of  a  record  and  its  address 
on  a  direct-access  storage  oevlce*  Records  are  stores  and 
retrieved  through  the  use  of  this  relationship.  Direct  tiles 
allow  Individual  records  to  be  accessed  very  oulct^lv.  since 
direct  accessing  requires  the  address  of  tne  specific 
location  of  a  record  desired,  direct  access 
requires  an  addressing  scheme  that  computes  a  unique 
address  for  each  record.  Generally,  the  recoro  icey  nust  n® 
transformed  Into  a  disk  storage  address, 

Ihe  most  common  approach  to  transformina  recoro  <“ys 
Into  storage  addresses  Involves  an  arithmetic  procedure  that” 
generates  "ranaom  addresses"  from  record  keys,  mere  are 
several  randomizing  aiaoritnms.  The  most  conimon  alocritnn 
Involves  tne  generation  of  addresses  by  dividing  the  record 
key  by  positive  prime  numoer,  usually,  the  prime  numoer 
is  the  largest  prime  number  that  is  less  than  the  numner  rf 
available  addresses,  Tne  remainder  of  a  division  operation 
is  used  as  the  address  locator,  A  randomizing  eigorltn-f 
always  generates  tne  same  address  for  a  particular  <ey. 
Therefore,  given  the  key  to  a  record,  the  computer  c«qn 
calculate  the  disk  address  and  then  access  tne  record  in  a 
matter  of  milliseconds. 

Unfortunately,  as  in  the  case  of  hashing, 
occasionally,  a  random-address  generator  generates  the  same 


address  for  two  or  more  keys.  T>ie  second  and  succeedlnn 
records  with  duplicate  addresses  are  referred  to  as  synonyns 
(as  collisions  In  nashlnq).  When  a  synonym  occurs*  tne 
record  having  the  duollcate  address  can  oe  stored  in  a 
location  next  to  other  synonyms  of  the  record  stored  at  the 
computed  address,  or  It  can  he  stored  In  a  oenerai  overtlo'" 
area.  In  either  case,  if  a  desired  record,  <«hicn  is 
determined  by  checklnq  the  key  stored  in  the  record,  is 
not  located  at  tne  computed  address,  a  sequential  searcn  of 
synonyns  Is  Invoked  until  the  desired  record  is  located. 
This  sequential  searcn  slows  processing  slightly.  A  qooi 
randomizing  algorithm  generates  tew  synonym  (collisions) 
for  a  particular  set  of  keys,  further  analysis  and 
modification  of  the  randomizing  aloorlthm  is  needed. 

Anotner  drawback  of  a  direct-access  tecnnique  Is 
that,  by  design.  It  usually  leaves  large  qaos  between 
records  on  a  disk,  Tnls  results  In  wasted  al5<  srace,  as 
in  the  case  of  hashing,  some  of  the  gaps  may  oe  consumed  ov 
Synonyms,  hut  considerable  wasted  srace  still 
remains.  An  offsetting  advantage  Is  the  Incredihle  soeed 
with  records  can  be  accessed,  regardless  of  the  size  of  the 
file,  A  very  Important  concept  to  grasp  with  respect 
to  direct  accessing  Is  that  the  physical  locations  of 
records  bear  no  relationship  to  the  logical  view  of 
the  data.  The  random  generation  of  addresses  onyslcally 
scatter  records  throughout  the  disk  such  that,  without 


knowledge  of  the  randomizing  algorithm  used  to  transform 
the  Keys,  locating  a  record  requires  a  sequential 
search  through  a  non-seguentlally  ordered  file.  The  use  of 
a  randomizing  algorithm  Is  an  extraordinary  deviation  from 
the  way  that  files  are  maintained  manually.  ^'o''<ever.  It 
Is  a  highly  suitable  technlaue  for  computer-processed 
files.  Any  one  record  of  several  thousand  or  even  several 
million  can  be  located  instantaneously  at  tne  expenses 
some  dlsK-space  wastage  fRef,  53J. 

4.  Eslaasv  aad  SacoadasE  Kaits 

The  run-time  performance  of  a  file  system  js 
Influenced  by  the  software  to  organize  and  s»bseguentlv 
access  the  reguesteo  data.  Fast-access  metnoos  can 
generally  be  designed  when  all  logical  conditions  are 
expressed  in  terms  of  primary  keys  alone,  l.e.,  ail  access 
requests  are  to  single  records  via  tneir  primary  keys.  It 
is  much  more  difficult  to  design  fast-access  metnods 
When  logical  conditions  require  secondary  keys,  that  is. 
When  access  requests  are  to  set  of  records,  a  primary  key 
Is  a  data  item  that  uniquely  Identities  a  recor'i,  top 
primary  key  of  a  record  corresponds  to  the  identifier  of  a 
real-world  entity.  As  with  Identifiers,  there  mav  he 
several  possible  or  candidates  primary  keys  for  the  same 
record.  Also,  two  or  more  data  items  may  be  required  to 
Identify  a  record,  A  secondary  key  is  a  data  item  that 


does  not  uniquely  Identify  a  record  but  identifies  a  number 
of  records  in  a  set  that  share  the  same  oroperty.  for 
eyample,  the  data  item  MAJOR  might  be  used  as  a  secondary 
<ev  for  STUDENT  records,  of  course,  this  data  ite-f  does 
not  Identify  a  unique  recoro;  for  examoie,  many  students 
will  have  business  as  a  major.  However,  the  secondary 
icey  does  Identify  a  subset  of  students  who  are  oustness 
majors.  Secondary  keys  arise  when  data  are 
referenced  oy  categories, 

wot  all  secondary  keys  need  to  ne  indexeo,  nut 
before  a  database  designer  can  decided  whicn  indexes  to 
create,  all  secondary  keys  must  be  Identified,  ^neo  aii 
data  processing  Is  known  In  advance,  then  computer  nrsouram 
specifications  provide  an  excellent  source  to  identity 
secondary  keys.  Figure  45  illustrates  soii.e  oenera] 
guidelines  for  Identifying  secondary  and  even 
alternative  primary  keys  fhef,  53], 

5.  !i«Zsaas  aod  Q-»>  zzeea 

If  an  index  Is  helpful  in  storlno  and  search ino 
through  data  records,  then  It  Is  posslole  for  one  Index 
to  help  to  omanlze  another  Index,  ISAM  and  other  file 
organizations,  as  well  as  a  nost  of  other  data  structures, 
are  all  based  on  tnls  approach  of  indexing  Indexes,  mis 
type  of  hierarchy  of  data  and  pointers  to  data  is 
generalized  by  the  tree  data  structure.  Trees  can  be  used 
to  organize  data  directly  or  organize  Indexes  into  data. 


Data  items 


Data  item 
names 


Records 


r  ~ 

> 

PRODUCT;? 

DESCRIPTION' 

FINISH 

ROOM 

PRICE 

0100 

TABLE 

OAK 

DR 

500 

0350 

TABLE 

NLAPLE 

DR 

b25 

'  0625 

CHAIR 

OAK 

DR 

100 

0973 

WALL  UNIT 

PINE 

FR 

750 

;  1000 

DRESSER 

CHERR'l 

BR 

SOO 

1250 

CHAIR 

M.APLE 

LR 

400 

'  1425 

BOOKCASE 

PINE  . 

LR 

250 

1600 

STAND 

BIRCH 

BR 

200 

1  1/  /5 

DRESSER 

PINE 

BR 

500 

2000 

WALL  UNIT 

OAK  . 

LR 

1200 

V - „ - ^  V - „ - / 

Priman.'  key  Secondary-  keys 


Figure  45.  An  Example  of  Primary  and  Secondary  Keys 


A  tr«e  data  structure  has  the  property  that  each 
element  of  the  structure,  except  the  root  has  only  one 
path  comina  In, (that  is  to  say  that  there  is  only  one 
pointer  that  points  to  any  given  element),  hut  they  nay 
be  zero  or  many  paths  coming  out  of  an  element. 

The  tree  structure  described  above  are  binary 
search  trees  and  multiway  (m-way)  search  trees.  A  ojnary 
tree  is  one  of  the  best  known  and  most  freouenMv  used 
data  structures  for  organizing  data  that  are  scored 
entirely  in  the  main  memory  while  being  processed.  'Olle 
binary  trees  nave  seldom  oeen  used  tor  organizinn 
large  files  in  two-ievel  storage,  tne  multiway  tree,  wnicm 
is  essentially  a  generalization  of  the  binary  tree,  has 
received  widespread  use  for  organizing  Indexes  tor  large 
files  in  external  memory. 

The  principle  reason  for  tne  freguent  use  of  tne 
multiway  search  tree  and  the  infrequent  use  of.  tne  olnarv 
search  tree  in  two-level  storage  Involves  tne  number  of 
accesses  to  the  external  storage  to  locate  a  recoro  with  h 
specific  key  value. 

Tne  B-trec,  where  B  stands  for  balanced,  ime.anlng 
all  leaves  are  the  same  distance  from  tne  root).  Is  a 
multiway-search  tree  with  restricted  growth  (see  (■'laure  4h). 
B-trees  guarantee  a  predictable  efficiency  tnat  many  other 
types  of  trees  do  not,  a  B-tree  of  order  m  is  a  tree  that 
that  satisfies  the  following  properties: 


(1)  Every  node  nas  less  than  or  equal  to  fn  sons. 

(2)  Every  node,  except  the  root  and  leaves,  nas 
at  least  m/2  sons. 

(3)  The  root  has  at  least  2  sons  unless  It  Is 
a  leaf  (terminal  node). 

(4)  All  leaves  are  at  the  same  level  and  only 
contain  pointers  to  the  actual  data  records, 
l.e..  carry  no  Information. 

(5)  A  nonleaf  node  (an  Internal  node)  that  has  ic 
children  win  contain  keys. 

The  effectiveness  of  a  B-tree  search  is  determined 
bv  the  Shape  of  tne  tree,  and  the  shape  of  the  tr»*e  is 
determined  by  the  order,  m.  /hen  m  Is  small,  a  tree  is 
tall  and  narrow,  and  when  n  is  larae.  a  tree  is  short  ani 
bushy.  The  maximum  number  of  nodes.  K.  that  must  be  accessed 
durlno  a  B-tree  search  is 

K  <  or  «  I  ♦  log  (m/2)  ((n+1)/?) 

where  m  is  the  order  and  n  Is  the  number  of  >cev 
values  In  the  tree. 

Since  It  Is  preferable  to  use  a  larqe  value  for  m. 
If  m  s  (ntl).  then  only  one  level  exists  In  a  tree;  this 
choice  of  m,  however.  Is  not  reasonable  If  a  tree  is  too 
larqe  to  fit  In  the  main  memory,  when  a  vame  for  m  is 
selected,  the  Primary  objective  Is  to  minimize  tne  total 
amount  of  time  required  to  search  a  P-tree  for  a  xev  value 
KV.  This  tine  has  two  component: 

(1)  the  time  required  to  access  a  node  In  the 
external  storage,  and. 


(2)  the  time  required  to  search  this  node. 

In  Internal  memory,  tor  kv.  It  turns 
out  that  there  Is  a  value  of  m,  ml,  for 
Which  the  search  time  Is  a  minimum. 

For  values  of  m  exceeding  ml,  the 
total  amount  of  time  required  to  search 
a  B-tree  Increases  fRef,  54], 

Searching  a  B-tree  for  a  specified  key  value  is  as 

follows.  A  node,  starting  with  the  root  node,  is  brcuont 

into  the  main  memory  and  searched,  posstoly  using  a  binarv 

search  for  the  given  argument  icey  value  among  the  tcey  values 

Kl,K2,..,Kj,  If  the  search  Is  successful,  tnen  the  desired 

key  value  Is  located,  but  If  the  search  Is  unsuccessful 

because  the  argument  key  value  lies  between  1  and 

then  the  node  pointed  to  by  PI,  which  points  to  a  subtree 

holdlno  key  values  between  Ki  and  K(l+l),  is  retrieved 

and  the  search  continued.  The  pointer  Po  is  used  it  an 

argument  key  value  oreceoes  Kl,  and  Hi  Is  used  If  an 

argument  key  value  follows  kj  in  sorted  order.  If  Pi  *  null, 

the  search  Is  unsuccessful. 

The  Insertion  process  for  B-trees  is  relatlveiv 

simple;  each  terminal  node  corresponds  to  a  oiace  wn-re  a 

new  key  value  may  be  Inserted,  Its  algorithm  is  ratner 

simple  and  straightforward,  Iji  general,  a  key  value  into  a 

B-tree  of  order  m  with  the  terminal  nodes  at  level  b  is 

Placed  in  an  appropriate  node  on  level  b-l,  if  this  node 

now  contains  m  key  values,  then  It  must  be  split  into  two 

alstinct  nodes.  For  example.  If  a  node  after  the 

Insertion  of  a  naw  key  value  looks  like 


Po  K1  PI  K2 


•  1 1 


P(m-i)  Km  Pm  tnen  If 


Is  split  Into  two  nodes 

Po  K1  P1  K2  ,,,Kam/2)-l)  P((m/2)-l)  ^ind  P(m/2) 
K{(m/2)>fl  •••  Km  Pm  and  the  Key  value  K(m/2)  Is  Inserted 
into  the  father  of  the  original  nodet  This  Insertion  may 
cause  a  father  node  to  contain  »  Key  values,  and  if  so,  it 
Is  split  In  the  manner  illustrated  above.  If  a  root  node 
must  be  split  (  a  root  of  course  has  no  father),  then  a 
new  root  node  is  created  containing  the  single  Key  value 
K(m/2).  The  tree  becomes  one  level  taller  in  this  case. 
Thus,  a  B-tree  grows  upward  from  the  root  top  Instead  of 
downward  from  the  leaves.  The  procedure  described  above  for 
inserting  new  key  values  into  a  fa-tree  is  exactly  tne 
procedure  used  to  create  a  R-tree  (Pef,  431, 

The  deletion  of  a  Key  value  from  a  u-tree  Is  more 
complicated  than  Inserting  a  new  value  into  a  B-tree,  tne 
deletion  of  a  Key  value  on  level  L-l  slnpiy  causes  it  to 
oe  erased  from  a  node.  When  this  erasing  maKes  a  node  too 
empty  ,  that  is,  underflow  occurs,  the  rlont  or  left 
brother  is  examined  and  key  values  are  moved  from  the 
brother  until  both  nodes  have  approximately  the  same 
number  of  Key  values.  The  Key  values  are  not  moved  directly 


from  the  brother 

to  the 

underflow 

node. 

insteaa 

,  the 

preceding  key  value  In 

the  parent  mode 

is  moved 

to  the 

underflowed  node 

and  the 

preceding 

Key  value 

in  the 

brother 

replace  the  Key 

value 

in  the 

father 

node,  A 

delete 
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ODcratlon  that  results  in  underflow  falls  only  If  the 
brother  is  minimal  full  [Pef.  54]. 

In  Short,  as  far  as  Its  operation,  B-trees  are 
similar  to  the  Indexed-sequentlal  file,  but  «ltn 
better  performance  for  insertion  and  deletion.  Also, 
it  does  not  require  a  separate  index  file.  However,  tne 
scanninq  of  subfiles  Is  not  efficient  as  tne  indexea- 
sequentlal  file, 

A  B+tree  Is  represented  by  nodes  wnlcn  are 
Implemented  by  different  blocks  In  the  file.  The  tree-bloc< 
list  (FPL)  Is  used  to  maintain  a  list  of  free  block’s  for 
dynamic  (space)  manaqement  of  blocks,  Inltlallv,  ai) 
blocks  of  the  file  are  Placed  on  the  list  by  initiallzlnn 
the  FBt,  fields  accordlnqiy.  Subsequently,  the  Mock 
allocation  from  the  list  takes  place  at  the  beqlnnlnq  of 
the  list.  Cnee  a  block  is  allocated  to  tne  runnlnq 
process,  the  same  field  Is  used  for  a  different  ourpose  If 
it  happens  to  be  a  sequence  nlock.  Tt  Is  used  to  store  tne 
block  number  of  the  sequence  block  next  In  tne 
lexicographical  order.  If  the  block  Is  an  Index  tyoe,  tne 
field  Is  not  used  any  lonner,  Tne  second  field  or  bTyPfc, 
Indicates  the  type  of  the  block  (node),  whether  It  be  an 
Index  or  sequence.  Each  B+tree  In  the  Index  file  is 
Identified  by  its  root  node,  and  the  root  node  number  is 
stored  In  the  master  Index,  in  which  there  is  an  entry  for 
each  secondary  key  and  the  primary  key  defined  for  the  file. 


Research  has  been  undertaicen  by  Koymen  TRef.  551 
to  deslqn  and  Implement  a  R'f tree-oased  keyed  sequential 
access  method  (KSAM),  ksak  provides  primary  and  seconaarv 
access  Cor  either  direct  or  sequential  processlnq,  PrlTarv 
access  to  a  data  file  requires  three  levels  of  indexes: 
super,  master,  and  primary  indexes.  Secondary  access 
requires  an  additional  index  level,  i.e..  Is  secondary 
indexes.  The  superindexes  and  master  Indexes  are  transparent 
to  the  user  and  are  used  solely  bv  the  system,  me  nrlnarv 
index  is  oroanlzed  as  a  Octree,  containing  proper  Hn<ane 
to  the  respective  data  files.  In  the  Implementation  of 
secondary  Indexes,  a  file  is  used  to  store  accession  lists, 
a  term  applied  to  the  records  In  a  secondary  Index,  wnich 
contain  Pointers  used  in  accessing  the  oata  records,  each 
secondary  Index  is  In  turn  organized  as  a  B+tree  contalnim 
Proper  linkages  to  accession-list  flies.  Thus,  linkage  frnm 
the  B>tree  of  a  secondary  Index  to  the  respective  data 
files  Is  provided  via  tne  accession-list  file,  tlnaily, 
another  file  Is  used  to  represent  all  the  m-trees 
associated  with  a  data  file,  Tnus,  three  files  suffice  tor 
the  Implementation  of  a  KSAM  data  file  and  Its  associated 
Indexes,  The  implementatlor  schema  organizes  eacn  of  tne 
tnree  files  as  a  direct-access  tile,  Tnus  tne  nion 
Popularity  of  direct-access  files  makes  the  Implementation 
possible  In  almost  any  programming  language  [Ref,  bSJ. 


In  short,  then,  a  B^tree  has  tha  same  Index  file  as 
the  B*tree  (with  icey  values  only),  has  similar  data  file  as 
the  sequential  file,  and  as  far  as  Its  operations,  it  has 
the  benefits  of  both  the  Indexed-sequentlal  file  and  B*tree 
Index  structure, 

6 .  Ciu&tacftd  Ellas 

The  notion  of  the  classification  has  been  usea  for 
centuries  In  many  disciplines  of  the  social  and  natural 
sciences.  The  classification  Involves  Placing  a  set  of 
objects  Into  classes  or  "clusters'*  In  such  a  way  that  the 
Objects  within  a  class  are  more  similar  to  each  other  t^an 
they  are  to  objects  outside  the  class.  The  slmiiarltv 
between  objects  Is  defined  In  terms  of  known  oroperties  of 
the  Objects,  Such  a  grouping  Is  referred  to  as  a 
"clustering"  or  a  classification,  A  file  in  wnlch  documents 
tnat  exhibit  same  sets  of  Index  terms  are  grouped  Into 
clusters  Is  called  a  clustesed  To  facilitate 
searching  In  a  clustered  file  each  cluster  is  Identified 
py  a  representative  called  the  "centroid",  A  search  is 
usually  carried  out  by  first  comparlnq  a  query  with  all 
the  cluster  centroids;  then,  for  those  centroids 
exhibiting  a  sufficiently  high  similarity  with  the  query, 
all  objects  In  the  corresponding  clusters  are  examined 
and  those  that  nave  a  sufficiently  nigh  query  document 
similarity  are  retrieved.  This  approach  is  based  on  tne 
assumption  tnat  documents  In  tne  same  cluster  are  of 


interest  to  the  same  user  and  would  therefore  be  requested 
Jointly, 

Since  the  objects  within  a  cluster  are  retrieved 
Jointly,  it  is  desirable  that  they  oe  Kept  In  close 
proxlnltv  within  the  Physical  storage,  A  simple  scheme 
by  which  objects  of  the  same  cluster  can  be  kept  close  to 
each  other  is  to  store  objects  oy  clusters,  so  that  each 
Object  is  stored  as  many  times  as  the  number  ot  clusters 
In  which  it  appears,  and  all  objects  In  the  same  cluster 
are  stored  consecutively.  This  scheme  is  called 
"cluster-inverted"  oraanizatlon. 

Clearly,  if  the  clusters  are  pairwise  disjoint, 
that  is,  the  classification  is  a  partition,  the 
above  scheme  has  no  redundancies.  This  is  the  case  ot 
formatted  databases.  However,  if  clusters  are  allowed  to 
overlap,  which  is  the  case  of  textual  databases,  texts  that 
are  pertinent  to  more  than  one  cluster  nave  to  he  stored 
repeatedly.  One  solution  to  this  pronlem  is  to  store 
opjects  (l.e,,  texts)  of  each  cluster  in  contiguous 
locations  while  ■nlnimlzlno  redundancy.  This  is 
accomplished  by  characterizing  conditions  under  which  an 
arrangement  without  redundancy  exists.  Since  the  recoros 
are  to  he  jointly  retrieved  in  the  response  set  of  any 
guery,  the  property  is  termed  the  consecutive  retrieval 
oroperty,  CRP.  There  are  two  types  of  clustered  tiles;  a 


single-level  clustering  file  to  have  CRP  has  been  developed, 


In  the  form  of  a  linear  time  algorithm  to  test  ChP  for  n 
given  clustered  file  and  to  Identify  the  proper  arrangement 
of  objects,  If  CRP  exists.  Experimental  results  hy  oeogun, 
Raghavan  and  Tsou  [Ref.  56],  Indicates  that  the 
algorithm  generates  minimum  redundancy  organization  most  of 
the  time,  Moreover,  on  the  average,  the  near  optimal 
solutions  show  approximately  a  so  percent  Improvement  In 
the  amount  of  redundancy  over  the  the  worst  case 
organizations. 

In  the  case  of  multilevel  clusterlros,  the  prohie’’’ 
Of  minimizing  the  number  of  different  arrangements  of 
Objects  that  have  to  be  stored  has  been  Investigated  with 
the  following  results.  It  is  shown  tuat  '  tor  anv 
nonoverlaoplno  multilevel  clustering,  there  exists  an 
arrangement  of  the  objects  such  that  every  level 
Clustering  has  CRP  with  resnect  to  the  arrannement. 
Similar  results  nave  been  obtained  for  certain  classes  of 
overlapping  multilevel  clusterings  (Ref.  561, 

7,  Xba  &isacta£U  ttlcsascbv 


A  directory  oerforms  an  Important  function  within  a 
file  system  called  mapping.  This  particular  phenomenon 
permits  a  user  to  create  name  spaces  ang  to  store 
(retrieve)  aata  in  (from)  them.  Name  mapping  converts  a 
symbolic  file  name  Into  a  physical  file  address  that 
Identified  where  tne  file  Is  stored. 
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Each  user  may  have  a  directory  of  his  own  files 
and  may  create  subdirectories  to  contain  groups  of  files 
conveniently  treated  together,  A  directory  normally 
behaves  llice  an  ordinary  file.  A  file  system  controls 
access  to  the  contents  of  directories;  however,  anyone 
with  the  appropriate  authorization  can  access  a  directory 
Just  llice  any  other  file.  In  designing  a  system  of  tile 
directories,  It  is  natural  to  thlnic  In  terms  of  a  hierarchy 
with  the  entries  In  the  higher  levels  of  a  directory 
being  other  directories,  Tne  entries  in  the  lower  levels 
are  a  mixture  of  files  and  alrectorles.  The  directory 
entries  In  a  hlerarcnlcally  structured  directory  can 
contain  either  system  directories  or  user  directories.  Data 
files  are  at  the  lowest  level  (see  Figure  47), 

The  most  Important  system  directory  Is  tne  master 
directory  (or  root  director"  .  Files  created  bv  users  are 
usually  located  ny  tracing  a  path  through  a  chain  of 
directories,  starting  wltn  the  master  olrectory,  until  tne 
desired  file  Is  found. 

An  interactive  user  or  a  program  runnim  m 
behalf  of  a  user  references  a  soeciflc  file  via  a  symnollc 
file  name.  A  symbolic  file  name  Is  usually  In  tne  form 
of  a  Path  name  that  Is  a  sequence  of  names  senarated 
DV  some  specially  designated  character  such  as  a  period 
or  slash.  In  the  simplest  case  there  Is  one  tn  one 
correspondence  between  symbolic  file  names  and  flies. 


Oirtctorv 


f  KEY  NAME 
INDEX 


AGE  INDEX 


AGE 

— 

DEPARTMENT 

— 

SALARY 

— 

19 

- 

20 

- 

• 

- 1 

65 

- 

DEPARTMENT  INDEX 


AUTO 

- 

FURN. 

- 

HOWE 

- 

SHOE 

- 

SALARY  INDEX 


1S000 

- 

20000 

- 

RECORD  ADDRESS  LIST  AREA 


03  0.7  1.4  •  . 

0.6 

1.2 

1.9 

• 

• 

• 

0.9 

1.7 

2.7 

• 

■ 

• 

1.5 

1.6 

2.1 

• 

• 

1.5 

1.7 

2.2 

2.5 

2.7 

3.7 

0.9 

1.4 

1.6 

2.1 

3.8 

3.9 

0.3 

0.7 

V4 

1.5 

1.9 

2.0 

0.6 

0.7 

0.9 

1.2 

1.9 

2.3 

3.5 

0.3 

0.6 

0.9 

1.2 

1.9 

2.3 

3.5 

0.7 

1.4 

1.7 

2.2 

2.7 

3.7 

3.8  3.9 

VS 

VB 

2.1 

2.5 

Directory  and  address  list  for  the  PERSONNEL  file 
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Figure  47.  An  Example  of  a  Directory 
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However,  In  more  advance  file  systems  the  same  file  may 
apoear  In  several  directories  under  possibly  different 
names;  that  is,  a  single  file  may  be  shared  by  two  or  more 
user  groups,  with  each  group  having  a  different  catn 
through  the  directories  to  the  file  Itself, 

Directory  structures  Imply  that  the  output  of  a 
directory  search  for  a  file  Is  the  file  Itself,  This  Is 
slxohtly  misleading,  because  the  terminal  riodes  nf  tf.e 
hierarchical  structure,  rather  than  containing  tne  file, 
normally  contain  an  object  commonly  referred  to  as 
file  descriptor  or  a  pointer  to  a  file  oescrlptor,  ■v 
file  descriptor  contains  Information  concerning  the  pnysical 
location  of  tne  file  and  the  physical  cnaracter 1st ics  ot  the 
file  tRef,  543, 

C,  THF  STRUCTURE  OF  DATA  FOR  RETRIEVAL 

The  primary  objective  of  file  organization  is  tn 
provide  a  means  for  record  retrieval  and  update,  ine  update 
of  a  record  Involves  its  deletion,  changes  In  some  of  its 
fields  or  the  Insertion  of  an  entirely  new  record. 
Certain  fields  In  the  record  are  designated  as  i<ev 
fields  or  search  keys.  Each  record  Incluaes  at  least  a 
search  key  which  Is  used  to  generate  the  Index  of  the  file, 
A  combination  of  search  keys  specified  for  retrieval  is 
termed  as  a  query.  The  simplest  structure  of  data  is  the 


JL&dexftd  saauAfitlal  saa£e&»  which  has  already  been  discussed. 
For  more  elaborate  retrievals,  the  structures,  Include 
DultXlist,  iauactad  dlle,  and  callulas  aultllisa,  as  'veil  as 
dudsid  and  slaa  atsucdusad  diles.  They  are  discussed  herein. 
1.  Xda  ttultiliat  Ella  Cdsaalzadiaa 

The  BUXdilXat  tile  acaaaizadlaa  consists  of  a 
directory  file  contalnlna  Index  entries,  and  a  data  tile,  a-i 
index  entry  for  a  key  value  consists  of  the  icey  value,  a 
pointer  to  the  list  ot  records  In  the  data  tile,  containing 
tne  key  value,  and  the  number  of  records  in  the  list, 

Flaure  48  Illustrates  a  multilist  file  indexed  on 
the  DFPARTMFNT  and  SALARY  keys.  The  format  of  eacn  lata 
record  In  a  multilist  file  consists  of  two  segments  [set, 
411,  Segment  '  one  consists  of  one  or  more  key/ney- 
value/pointer  triples,  in  wnich  the  pointer  points  to 
another  record  containing  the  same  key/key»value  oalr,  me 
second  segment  contains  tne  values  of  the  nonkey  data  Items, 
If  any.  For  example,  a  record  in  Figure  48  has  the  format: 
rFPARTVFNT/AUTO/l ,7, 

PFPARThENT/HhvVE/l  ,9, 

SALARY/20000/1,6, 

Nonkey  data  Item  values, 
a.  Answering  Queries 

A  technique  that  can  be  employed  for  answering 
queries  within  this  file  organization  is  to  minimize  tne 
number  of  records  that  must  be  searched.  This  is  especially 


Important,  since  some  lists  may  be  lenothy  and  requlr*^ 
longer  search  times.  For  example,  a  query  to  retrieve  the 
records  of  all  employees  that  woric  In  an  auto  department 
and  nave  a  salary  of  15,0oo,  is  evaluated  as  follows. 
First,  the  two  Key  values  In  the  oroper  Indexes 
DFPAPTMFNT/AUTO  and  SALARY/lSOnu  are  evaluated.  Secondly, 
the  Index  entries  for  AUTO  and  15000  are  examines  to 
determine  their  resoectlve  list  lengths.  Then,  tne 
shortest  list  Is  examined,  which  In  this  case  Is  auto  fit 
nas  6  records  as  compared  to  S  for  15000),  Since  the 
records  with  addresses  1,7,  2,2,  2,7,  and  3,7  in  Figure 

have  both  of  these  occurrences,  they  therefore  satistv 
the  query.  For  query  conjunctions,  we  search  only  the 
shortest  list, 

b.  The  Query  Cost 

The  cost  to  process  a  query  for  the  multilist 
organization  is  measured  In  terms  of  the  time  reoulred  to 
decode  all  key  values  In  the  query  and  to  retrieve  data 
records.  The  query  cost  Is  therefore  defined  as  o  =  lt. 
Where  3  Is  the  querv  cost,  L  Is  the  shortest  list  length  in 
a  query,  and  T  is  the  average  time  to  access  a  record, 
which  Includes  the  seek  time,  latency  tim©  and  dnta 
transfer  time.  Thus,  when  the  list  lengths  associated  with 
key  values  In  the  terms  In  a  product  are  small,  the  cost 
for  guery  processing  Is  minimized. 


c.  Updating  Multilist  Flies 

Lists  can  be  ordered  or  unordered.  Adding  a 
record  to  an  ordered  list  requires  that  the  record  be 
inserted  in  a  specltlc  position,  while  for  an  unordered 
list,  at  the  head  of  the  list,  thus  avoiding  the  need  to 
traverse  the  list. 

Updating  of  multilist  files  Involves  either 
key  value  addition,  whole  record  addition,  or  deletion. 
Regardless  of  the  type  of  addition,  whether  wnole  record  or 
new  key  value,  one  or  more  of  the  index  entries  are  also 
updated.  When  the  lists  are  not  ordered,  there  exist  a 
simple  algorithm  in  which  a  new  record  can  he  easily 
Placed  at  the  logical  head  of  each  list  of  which  It  Is  to 
be  a  member.  Inis  is  true  both  types  of  additions.  For 
example,  to  add  key  values  to  an  existing  record,  such 
that  an  employee  who  works  In  HDWF  department  snares  nis 
work  time  between  the  HDWE  and  FUR'^i  departments.  rne 
employee's  record  must  be  updated  by  adding  the  kev  value 
FURN  to  Ms  record.  Adding  a  new  key  value  to  a  record 
implies  that  the  record  must  be  added  to  the  list  of 


records  indexed  by  the  new  key  value.  The  simple  algorithm 
can  be  used  to  add  one  or  more  key  values  to  a  record. 

Deleting  a  key  value  from  a  record  Is 


essentially  the  same  as  deleting  the  record  from  the  list 
Indexed  by  the  key  value.  Key  value  and  whole  record 
deletions  can  be  accomplished  by  using  a  simple  delation 


algorithm,  which  simply  adjusts  pointers.  If,  however, 
deletion  Implies  physically  removing  a  record  from 
lists,  and  the  retrieval  system  performs  real-time  uodatlng 
and  retrieval,  then  bl-directlonal  lists  should  be 
considered  to  represent  the  multilist,  A  bl-directlonal 
list  allows  deleting  records  without  traversing  the  list 
to  locate  Its  predecessor  record.  Although  pl-dlrect lo"a i 
lists  allows  deletions  of  records  to  be  done  more  rapidly, 
there  Is  the  storage  overheao  of  an  additional  oointer 
element  for  eacn  <ey  value  In  a  record, 

2,  iba  Xauacted  Ella  CsoaBizariaa 

Unllice  the  multilist  files,  where  records  are 
linked  together  with  oointers  kept  in  tne  individual 
records,  the  pointers  In  an  Inverted  file  are  removed  fro-* 
the  Individual  records  and  kept  in  separate  list,  called 
Inverted  lists.  An  Inverted  file  consists  of  two  components, 
a  directory  and  a  data  file  (see  mioure  49),  Tne  varianie- 
lenqth  Inverted  lists  of  pointers  corresponding  to  <ev 
values  are  kept  In  tne  directory.  Thus,  when  a  <ey  value  is 
oecoded  in  the  directory,  the  record  address  list  is 
Immediately  available  and  no  additional  access  is  required 
to  move  It  Into  the  main  memory.  It  Is  Important  to  note 
that  the  directory  should  be  kept  as  small  as  possible  so 
that  updating  can  be  performed  quickly  and  easily,  and 
that  It  can  he  keot  In  its  entirety  on  a  fast  storage 
device.  However,  unlike  the  directory  of  the  multilist  file 


Which  tends  to  be  small  In  size. 


the  directory  of  the 


inverted  file  tends  to  be  larger  In  size.  The  size  of  a  the 
directory  can  be  controlled  by  limiting  the  number  of 
data  Items  on  which  a  file  Is  Inverted.  A  oarMally 

Inverted  file  stores  the  record  addresses  assocletei  with 
all  values  of  certain  (l.e..  Keys)  but  not  all  data  items,  a 
completely  Inverted  file  Is  one  In  whicn  every  aata  item  is 
treated  as  a  Key  and  the  record  addresses  associated  •ith 
every  Key  value  are  stored  In  inverted  lists,  in  this  case, 
the  directory  subsumes  the  file.  There  is  no  need  of 
Keeping  the  data  file  any  more, 
a.  Answerlna  Queries 

The  Inverted  file  organization  allows  raold 
access  to  records  based  on  any  Key.  iJueries  can  be 
determined  by  accessing  and  manipulating  the  inverted 
lists  of  record  addresses  prior  to  accessing  any  data 
records.  Tnls  advantage  is  possible  because  tne  pointers 
to  records  indexed  by  a  Key  value  are  maintained  In  an 
inverted  list  rather  than  In  the  data  records.  For  exampip, 
to  retrieve  the  records  of  ell  the  emplovees  that  woiK  in 
an  AUTO  department  and  have  a  salary  of  15ooo  dollars, 
the  Key  values  AUTO  and  15000  are  decoded  in  tne  proper 
Index  and  produces  the  address  of  the  list  of  data 
records  Indexed  by  tne  Key  value,  DEPARTJ^EnT/auto,  an'^ 
SALARY/15000.  The  address  of  the  list  of  data  records  are 
pointers  in  the  Inverted  lists  for  Key  values  auto  and 
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15000,  respectively.  The  two  Inverted  lists  are  !»»oved  into 
the  main  memory,  and  the  Intersection  of  these  two  lists  Is 
performed.  The  addresses  In  the  Intersection  list,  namely 
addresses  1.7,  2,2,  2.7,  and  3,7,  are  the  addresses  of  tne 
records  that  are  to  be  retrieved  and  that  will  satisfy  the 
Query  (see  Figure  4fl), 

b.  The  Query  Cost 

In  the  Inverted  file,  the  cost  to  process  a 
query  Is  the  sum  of  the  cost  to  decode  all  key  values  In  the 
query.  Plus  the  cost  to  access  all  inverted  lists,  one  oer 
key  value.  Plus  the  cost  to  process  the  Invertea  lists. 
Plus  the  cost  to  retrieve  the  data  records  tnat  satisfy 
the  duery.  for  the  Inverted  organization,  tnere  are  h/i'i 
records,  where  L  Is  the  shortest  list  length  In  a  ouerv 
and  N  Is  the  number  of  data  record  addresses  per  recorci, 
accessed  for  each  of  the  average  number  of  terms  in  a 
single  query,  designated  by  r.  Thus,  the  time  required  to 
retrieve  the  T  Inverted  lists  Involved  in  tne  list 
Intersection  Is  L/N  ♦  T  *  A,  where  A  is  the  averaqe  time 
to  access  a  record  and  move  it  to  internal  me">orY, 
Whereas  for  a  multilist  every  record  In  the  shortest 
list,  deslqnated  S,  must  be  accessed,  for  the  inverted 
orqanlzatlon  only  those  os  records,  where  p  Is  the  ratio  of 
the  number  of  records  that  satisfy  a  query  to  S,  must  be 
accessed.  Therefore,  the  query  cost  Is  defined  as  Q  a  (oS  ♦ 
(L/N  «  T))  *  A,  It  Is  Important  to  note  that  tne  larger  the 


number  of  records  per  inverted  list,  the  larger  tne 


amount  of  time  to  access  the  Inverted  lists* 
c.  Updating  Inverted  Piles 


Updating 

an  Inverted 

file 

Is  more 

Involved 

because 

the  Inverted 

lists  must 

be 

updated • 

Por  this 

reason. 

the  inverted 

organization 

is 

most  useful  for 

retrieval  when  the  update  volume  Is  relatively  Iojx 
compared  to  the  query  volume.  By  performing  intersections 
and  unions  of  Inverted  lists,  the  inverted  file  system  can 
provide  exact  statistics  about  the  records  having  certain 
key  values*  Whereas  multilist  file  systems  can  only  orovlae 
upper  bounds  or  approximations  of  the  recoro  numbers  of 
those  records*  whole  record  and  key  value  addition  ani 
deletions  are  accomplished  by  straightforward 
algorithms* 

3*  Xbo  CallttXac  XultiXlst  Ella  Qsaaalzallao 

The  cellular  multilist  organization  is  derived  from 
the  multilist  organization*  Since  the  performance  of  a 
multilist  system  suffers  when  lists  are  lengthy,  cellular 
multilist  organization  is  an  attempt  to  arrange  the  recoros 
for  more  optimal  retrieval  with  shorter  lists*  The  length  of 
each  list  is  restricted  to  the  storage-cel 1  size  so  that 
records  in  the  list  do  not  extend  bevond  cell  boundaries,  a 
cell  can  be  considered  as  a  track  or  cylinder  of  a  disk* 

Each  Index  entry  for  a  cellular  multilist  file 
consists  of  one  or  more  list  head  polnter/list  length  pairs* 
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There  exists  one  such  pair  for  each  cell  contalnlno  records 
Indexed  by  a  key  value.  The  directory  for  this  type  of  flies 
Is  similar  to  multilist  file,  except  that  It  is  larger. 
Figure  48  Illustrates  the  corresponding  DFP.M<Tf- 
indexr  organized  for  a  cellular  multilist  file,  tor 
same  data  for  multilist  file.  The  Index  entry  for  icev 
value  Alim  has  three  list  head  polnter/llst  length  pairs, 
each  one  corresponding  to  a  list  in  a  single  cell.  That  is, 
the  AUTO  list  in  Figure  48  Is  subalvided  into  three 

snorter  lists,  one  of  length  two  in  cell  1  with  tne  heaa  .^t 

address  1,5,  one  of  length  three  in  cell  2  with  tne  head  at 

aodress  2,2,  and  one  of  length  one  In  cell  3  witn  the  head 
at  address  3,7, 

a.  Answering  Queries 

The  cell  concept  Is  used  to  provide  good 

response  time.  For  example,  to  retrieve  the  records  of  all 
employees  that  worK  in  an  AUTO  and  HDuK  department,  note 
that  the  records  having  xey  value  AoTtt  reside  In  cells  i, 
2,  and  3,  and  records  having  xey  value  hdwf  can  be  fourio 
In  cells  0,  1,  and  2,  The  only  records  that  can  oe  common 
to  both  lists  are  located  In  cells  1  and  2,  Tne  Aiiro  list 
for  cell  1  is  traversed,  since  the  length  of  tne  list  in 
cell  1  Is  Shorter  (length  Is  two  for  auto  and  three  for 
HhWF),  Fach  record  is  then  examined  of  the  existence  of  tne 
Key  value  howl.  Record  1,5  belongs  to  both  lists.  In  cell 
2,  the  HDWE  list  Is  the  shortest  so  each  record  in  it  is 


accessed  and  examined  for  tne  occurrence  of  auto 


since 


there  are  no  records  In  cell  2  common  to  both  lists,  only 
record  1.5  satisfies  the  query. 

Updating  cellular  multilist  files  Is 
essentially  the  same  as  updating  multilist  files. 

4.  CaaBACisoD  at  liuitili&t  aad  iBUBBtad  Ellas 

Figure  50  Illustrates  the  advantages  and 
disadvantages  of  multilist  and  Inverted 

list  files. 

5.  QtbaB  Ella  asaaalzailans  tai  Eatslasal 

other  ones  (kef.  54]  Include  the  Bubsld  list  file 
organizations  and  slaa  sliuctusad  111a  organizations. 

The  hybrid  list  file  organization,  as  Its  name 
Implies,  Is  a  hybrid  between  a  multilist  and  an  inverted 
list  organization.  Hence,  this  hybrid  file  Is  organized  in 
sucn  a  way  as  to  minimize  the  system  search  effort  in 
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Organization 


Multilist 


Invened 
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Advantages 


Easily  programmed. 

Conjunction  queries  are 
efficiently  handled  for 
short  lists. 

Easily  updated  since 
complete  reorganization 
of  lists  is  avoided. 

Good  for  simple  and  range 
queries. 

Low  response  time  for 
conjunctive  queries. 

Efficient  use  of  storage 
space  if  key  values  are 
removed  from  data 
records. 

Satisfactory  for  real-time 
retrieval. 


Disadvantages 


Conjunctive  queries  are 
inefficiently  handled  for 
long  lists. 

The  number  of  records  that 
satisfy  a  query  bears  no 
relation  to  the  number  of 
records  accessed. 


Updating  is  complex  since  the 
inverted  lists  are  variable 
in  length  and  must  be 
ordered. 

Work  space  in  internal 
memory  is  required  to 
perform  the  logical 
processing  of  inverted  lists- 


Figure  50.  Comparison  of  Multilist  and  Inverted 
File  Organization 
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separate  inverted  list  if  the  query  with  a  list  of  recoras 
is  greater  than  L;  otherwise#  the  set  of  pointers  Is 
embedded  In  the  data  records  as  a  multilist  organization,  m 
answering  queries,  the  retrieval  is  done  in  straiont 
Inverted  or  multilist  fashion,  depending  on  the  size  of  the 
list  of  records. 

The  cost  of  UDdatlnq  for  hybrid  file  organizations 
lies  somewhere  between  that  for  the  two  cure  organizations 
since  It  Is  easier  to  updote  a  multilist  file  than  an 
Inverted  file.  The  more  lists  that  are  stored  as 
nultlllsts,  the  easier  it  is  to  perform  updates, 

A  ring  structured  file  organization,  on  the  other 
hand,  is  a  linear  list  in  which  the  pointer  in  the  last 
record  points  bac»c  to  the  first  record  called  the  starting 
record  of  the  ring.  In  a  ring  structure  once  an  arbitrary 
record  is  accessed  every  other  record  in  tne  ring  also 
becomes  accessible,  A  rlno  structure  file  consists  of  three 
elements,  a  value  of  the  data  item,  (wnlcn  Is  usually 
associated  with  each  pointer  to  specify  in  which  rings  a 
record  is  an  element),  data,  and  a  pointer  to  tno  next 
record.  Figure  51  Illustrates  a  rlno  structure,  in  whicti, 
the  value  of  the  data  Item  is  "S"  for  the  starting  record 
and  "b"  otherwise. 

One  of  the  primary  advantages  pf  a  ring  structure 
Is  that  any  record  can  be  accessed  starting  at  any  point. 
An  Important  use  of  ring  structures  Is,  thus,  to  reoresent 


classifications  (types)  of  data.  All  records  having  the 
same  classification  (type)  belong  to  the  $a.De  ring. 
Associated  with  each  record  within  a  class  (tyne)  ‘^lay 
be  a  subclass  (subtype),  and  this  subclass  Is  reoresented 
by  a  ring  of  records.  Such  a  classification  schet>e  can 
be  represented  by  use  of  multiple  ring  structure,  1" 
which  multiple  rings  pass  through  a  t'ecora  with  tne  records 
in  each  ring  logically  related,  figure  52  Illustrates  such  a 
structure,  corresponding  to  the  sa.^e  data  as  In  (•'Igure  id. 
The  values  of  the  data  items,  dD,  f^S,  D,  and  S  are  used  to 
designate  the  major  department,  major  salary,  dev-artment 
subring,  and  salary  subrlng,  respectively.  A 
significant  disadvantage  of  ring  structures  is  that  tnev  can 
take  a  long  time  to  search.  Updating  ring-structure'^ 
files  Is  normally  straightforward,  insertions  of  nev 
records  Into  the  middle  of  a  ring  Is  usually  relatively 
simple,  Peietlons  of  records,  on  the  other  hand,  can  be 
more  complex.  When  a  record  Is  deleted  trom  a  ring, 
neither  Its  predecessor  record  nor  tne  starting  record  o* 
the  ring  has  to  oe  specified  since  the  predecessor  record 
can  be  found  from  anv  point  in  the  ring  by  traversing  the 
structure.  The  deletion  performed  by  traverslno  a 
structure  and  searching  for  the  predecessor  recon 
requires  that  the  address  of  the  record  to  be  deleted  be 
saved  and  compared  with  the  address  of  the  succeeding 
records  accessed.  In  this  case,  the  entire  ring  must  oe 


Ring  structure  with  a  head  record  and  a  special  pointer  in  each  record 


Figure  51. 


The  Ring  Structure 
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traversed  for  each  deletion.  Altering  the  value  of  data 
Items  in  a  record  poses  no  particular  problems. 

6 .  k  SuBoasu 

The  choice  of  a  file  organization  has  a  very 
Important  effect  on  the  performance  and  associated  costs  of 
a  file  system.  The  multilist  file  organization  is 
satisfactory  for  systems  that  do  not  reauire  extremely 
fast  response  times,  uor  do  they  require  exact  statistics  of 
the  records  and  attributes.  Nevertheless,  the  multilist  file 
organization  provides  very  compact  directory  despite  the 
volume  Of  data  file,  nn  the  other  nano,  the  invertea  tile 
organization  tends  to  generate  large  directories. 
Consequently,  fast  accesses  to  the  data  file  are 
overshadowed  by  the  amount  of  processing  and  accesses  to  tne 
oirectory.  The  trade-off  Is  not  to  "Invert"  a  file  down  to 
three  level  of  data  Items,  l.e.,  field. 

The  problem  of  selecting  an  appropriate  tile 
organization  depends  on  the  particular  users,  and  tneir 
environment,  Tnree  very  Important  guantlflable  performance 
measures  for  selecting  a  file  organization,  that  snouii 
be  considered  are  tne  total  storage  costs,  tne  average 
time  to  answer  a  typical  guery,  and  the  average  time  to 
perform  an  update.  The  file  organizations  that  have  the  best 
access  time  may  require  more  storage  and  complicate  update, 
l.e,,  as  access  times  decrease,  storage  and  update  costs  go 
up. 


'v»: 
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XII.  Q&zA  comcziQtt  zecutii&ues 


The  development  of  massive  information  storage  and 
retrieval  systems  have  undergone  tremendous  orowtn, 
Accoronanylng  this  growth  in  the  size  of  the  databases  has 
been  a  large  increase  in  the  number  of  users  and  duration  or 
usaoe,  resulting  in  tremendous  amounts  of  data  nelno 
transferred  from  computers  to  terminals.  One  alternative 
to  this  run-away  database  growth,  is  to  alleviate  tne  data 
storage  orobiem  through  the  representation  of  data  oy  more 
efficient  codes,  l.e.,  by  data  coaata&aiaa. 

Data  compression  is  a  technique  of  reducing  the  amotint 
of  storage  required  for  a  piece  of  stored  data  oy  reniacina 
the  data  with  some  representation  of  the  difference 
between  it  and  tne  data  next  to  It.  Data  compression 
can  reduce  alphanumeric,  numeric  and  binary  data  to  a 
Shorthand  notation.  For  example,  if  jn  alphanumeric 
positions  are  allocated  to  the  occupational  field  of  a 
personnel  database,  for  tne  occupational  9-cnaracter 
description  PRaFESSOH,  there  are  21  blanic  positions. 
Instead  of  Indicating  the  occupational  title,  an 
egulvalent  5-dlglt  data  code  can  be  encodes, 
thereby  eliminating  25  character  oositlons. 


2 


An  example  of  a  numerical  and  binary  compression  Is  as 


follows:  suppose  today's  date  Is  t  Jan  1986;  numerical 
representations  are  01  01  96,  while  binary  representations 
are  00001  OOOOl  lOlOllO,  Thus,  the  numeric  compression 
results  In  6  numeric  characters  of  storage,  while  tne 
binary  compression  results  In  only  5  hits  for  tne  ;iav 
field  (since  the  day  cannot  exceed  31),  4  olts  tor 

the  month  field,  and  7  bits  for  the  year  fpermittlno  a 
ranoe  of  1900-2027), 

Accordingly,  there  are  five  categories  of  data 
compression.  These  five  categories  are:  cull 
blt-oaoBloa*  sua-iafiatb,  bai£-bx£a  sacbics^  and  uAttata 
aubatXbutXaa. 

1.  Zba  buZZ«SuBObe&aiofi  Xecbaiaua 

The  mil i-suopresslon  technloue  has  been  one  of  the 
earliest  data  compression  techniques*  As  its  name  Implies, 
it  Is  a  technloue  that  scans  a  data  string  for  repeated 
blames  or  nun  characters,  Uoon  detection  of  suet,  a 
sequence,  the  null  characters  are  replaced  ov  a 
special  ordered  pair  of  characters.  The  first  Is  a 
compression  Indicator,  Indicating  that  null  suppression  has 
occurred,  and  the  second  indicates  the  number  of  nmi 
characters  encountered,  i''or  example,  taxing  the  data 
stream,  XYZbbht'bCVF,  where  b  denotes  a  blank,  the  compressed 
data  stream  would  be  XYZfSCVF,  where  ^  represents  tne 
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special  compression  indicator  character,  and  5  the  q'jantlty 
of  blanics  compressed  CRef,  57]. 


The  technique  for  decompression  is  very 
stralontf orward,  A  search  Is  oerformed  for  the  special 
character  used  to  denote  the  null  characters.  Upon  locatinq 
the  Indicator,  the  next  character  Indicates  tne  nunnoer  of 
nulls  compressed.  Hence,  the  original  string  can  oe 
reconstructed. 

This  technique  is  only  effective  as  lono  as  it  is 
employed  for  more  than  two  sequentially  encountered  null 
characters,  since  a  2-character  compression  sequence  always 
results.  For  nonsequential  null  characters,  a  tecnnioue 
known  as  bit  mapping  is  used. 

2.  Xba  axt  tiaBolao 

This  compression  technique  Is  employed  when  tne 
data  consists  of  a  high  proportion  of  soeclflc  data 
types,  such  as  numerics,  or  a  large  proportion  of  a  specific 
character,  such  as  blanks,  as  its  name  implies,  a  hit  mac 
Is  used  to  Indicate  tne  presence  or  absence  of  data 
characters.  For  example,  taking  the  string  nopKboHb,  wnere 
b  represents  a  blank,  the  bit  mapping  string  would  be 
1C010010*DKH.  The  zeros  represent  the  location  of  blanks, 
and  the  *  represents  the  bit-map  character,  to  denote  tnat 
blt-mapplng  compression  has  taken  place,  in  comparing 
the  two  versions  of  the  character  string,  we  note  that  tne 
original  data  string  of  8  characters  of  data  has  been 


iV5.5J 


reduced  to  4  characters*  3  data  characters  and  the  hit'-nao 
Character  CPef,  57], 


To  decompress  the  strlno  the  bit  map  is  used  to 
Indicate  that  certain  data  characters  have  been  encoded 
upon  and  must  be  decoded  In  order  to  reconstruct  the 


original  data  string. 

This  technique  Is  only  effective  as  lono  as  fJxed- 
slze  data  units  are  utilized*  such  as  characters*  bytes  or 
words.  Also*  this  technique  Is  directly  rrororrlonal  to 
the  percentaqe  of  occurrences  of  a  particular  character. 
If  there  are  two  or  more  significant  occurrences  of  otner 


characters*  only  the  character  with  the  hlonest  occurrence 
can  be  compressed.  Another  compression  technique  called 
run-length  encoding*  can  handle  adjacent  redundancy  of 


occurrences  of  all  characters  In  a  data  stream  tPef.  SR]. 

3.  Xba  &UB-Leaat&  sacadias  Xacbaiaua 

The  run-length  encoding  technioue  Is  a  data 
compression  method  that  reduces  any  type  of  reneatim 
character  sequence.  The  method  employed  Is  similar  to  the 
null  suppression*  in  that  It  uses  a  special  character 


to  denote  that  this  type  of  compression  has  occurred.  The 
comoresslon  Indicator  Is  followed  by  one  of  tne  reoeatlno 
characters  which  has  been  In  the  encountered  string  of 
repetitious  characters.  Finally*  a  count  character 
signifies  the  number  of  times  tne  repeated  character 
occurred  in  the  sequence.  The  general  character  format  isi 


a  special  Indicator,  any  repeatlna  data  character,  and  the 
character  count.  For  each  of  these  codes,  the  numerous 
unasslgned  characters  witn  unique  oit  representations  can 
be  used  (Ref  59] , 

4.  Zba  fiackiaa 

This  comoresslon  technique  Is  used  when  a  portion 
Of  the  Bit  pattern  that  represents  certain  cnaracters  In  a 
Character  set  becomes  repetitive.  It  is  actually  a 
derivative  of  the  blt-mapplnq  method.  For  examole, 
conslderlnq  taklnq  the  EBCDIC  (Extended  plnarv-Coded 
Decimal  Intercnanoe  Code)  character  set,  which  Is  an  Tt*> 
scheme  for  representing  cnaracters  by  combinations  ol  nits. 
The  half-byte  packing  can  be  utilized,  since  the  first  four 
bit  positions  are  all  set  to  binary  ones  to  reoresent 
numerics. 

To  compress  data  Into  half  bytes,  up  to  IS 
sequential  numeric  or  predefined  data  characters  In  a  strlnn 
can  be  compressed.  The  reason  of  15  characters  results  frcm 
the  use  of  a  4-Plt,  naif-byte  counter  to  denote  the  number 
Of  characters  being  compressed.  The  general  format  is  as 
follows-:  special  character  indicating  halt-byte 
compression,  half-byte  counter,  up  to  255  numerics  pacxeo. 
For  example,  taking  the  numeric  2112860,  the  binary  is, 
11110010  11110001  11110001  11110010  11111000  11110110 
11110000,  and  the  compressed  data  string  is  S  oin  ooio  onoi 
0001  0010  1000  0110  0000,  where  S  Is  the  special  character 


and  0111  is  the  number  of  pacKed  numerics  C7),  The  number 
of  bits  has  Deen  reduced  from  56  bits  to  40  bits. 

The  half-byte  packlno  can  also  be  used  when  data 
characters  do  not  have  a  repetitive  bit  structure,  such 
as  tne  ASCII  (American  Standard  Code  for  Information 
Interchanqe)  tables,  which  Is  a  standard  scneme  to 
represent  characters  by  combinations  of  bits,  ASCII, 
tables  employ  7  bits,  A  method  of  doing  this  Is  to 
predefine  the  occurrence  of  the  dollar  sign,  asterlsic, 
coi-nma,  the  decimal  point  and  the  10  numerics.  For  example, 
given  the  amount  $1,234,56,  the  ASCII  code  would  nc 

0100100  0110001  0101100  0110010  0110011  Oliomo  Oirilin 

OilOlOi  0110110,  and  the  compressed  data  strina  Is, 
00100  0001  01100  0010  0011  0100  0,1110  0101  olio,  where 
three  5-blt  stream  represent  a  dollar  sign,  a  comma  and  a 
decimal  point,  resoectively,  and  4-blt  streams  represent  the 
respective  numerics  (Ref,  58J, 

5,  xti«  SattasQ  Sub&£i£u£iaB 

This  compression  technioue  substitutes  a  special 
code  for  a  predefined  character  pattern,  that  is, 
common  Key  words  or  phrases  can  be  replaced  by  a  special 
code.  To  use  this  technique,  a  pattern  table  is 
required,  which  contains  a  set  of  list  arguments  (words 
or  Phrases  to  be  compressed)  and  a  set  of  function 
values  (special  character  codes).  For  example,  qlven  a 
limited  pattern  table  with  list  arguments:  at#  all,  and. 


both;  and  function  values  Sl»  S2,  S3,  S4,  respectively,  the 
data  stream,  all  naval  officers,  both  male  and  female,  at 
NPGS,  becomes:  S2  naval  officers  S4  male  S3  female  Si  ^PGs. 
The  employment  of  pattern  substitution  can  oe  highly 
advantages  when  texts  with  icnown  repeating  Patterns  are 
stored  In  the  database  (Ref.  S71, 

6.  Zba  Susaastf 

When  the  data  compression  is  used  to  reduce  storaae 
requirements,  the  overall  processing  time  is  also  reducen. 
This  is  oecause  the  reduction  in  storaoe  results  in  a 
reduction  of  disk  access  attempts,  &lt.houan,  the 
compression  techniques  result  in  additional  nrogram 
Instructions  being  executed.  It  Is  sianlf icar ti y 
less  than  the  time  required  to  access  and  transfer  lata. 
Hence,  a  reduction  of  storane  requirements  in  tne  case  also 
results  In  a  reduction  of  processing  times,  Tne  t.ost 
effective  means  of  employing  these  compression 
techniques  Is  to  combine  them  as  they  are  needed,  denendin:; 


XZII.  &m  UQUKLS  EQ£  fiAZAfiASES 


A  data  sadal  Is  an  abstract  reoresentatlon  or 
description  of  a  database  that  describes  how  tne  data  is 
put  together.  The  ouroose  of  the  data  model  is  first  to 
accurately  and  completely  represent  reoulred  data  of  a 
database  and  second  to  allow  the  dataoase  to  ro 
understandaole.  The  data  model  also  dictates  the  design  of 
the  corresponding  manipulation  language  since  each 
DVL  operation  is  defined  in  terms  of  its  effect  on  those 
modeled  data.  A  D^L  is  a  language  used  to  access  and  to 
UDdate  a  database.  A  DML  may  be  procedural  or  nonprocedural. 
To  manipulate  a  database  using  a  procedural  DmL,  a  user 
normally  writes  short  segments  of  bhL  statements  to<*t 
traverse  the  modeled  database  in  order  to  locate  tne  recori 
to  be  retrieved  or  updated,  Vonprocedural  dmi s  are  easier 
for  a  user  to  use  in  manipulating  a  database,  with  this 
type  of  D*»L,  the  user  does  no  have  to  traverse  a  database* 
instead,  the  user  specifies  only  what  is  wanted  and  allows 
the  system  to  decide  how  to  obtain  it. 

The  more  procedural  a  DML,  the  simpler  it  Is  to 
implement  since  the  user  directs  the  database  traversal, 
step  by  step,  on  how  to  ootain  data.  A  nonprocedural  is 
more  complicated  to  implement  since  it  places  the 


responsibility  of  deterralnlnq  how  to  obtain  the  data  and, 
therefore,  how  to  optinulze  the  search,  on  the  database 


system. 


Database  systems  are  categorized  according  to  the 


approach  they  adopt  In  data  model  and  accompanying  dmls. 
The  three  most  reknowned  approaches  are,  thA  !:le£a£C&idl 
data  aodal,  which  usually  supports  the  procedural  n-'L, 
because,  in  general.  It  is  too  inefficient  to  oertor'^ 


database  accesses  In  a  strictly  nonprocedural  manner,  ttc 
cetaatk  data  Bodal,  which  also  supports  the  procedural 
OML,  because  of  its  efficiency,  and  salatla&at  data 
BOdelf  which  supports,  the  nonprocedural  P'^i*  because 


searching,  of  tables  (relations)  does  not  require  traversals 
and  Is  easily  expressed  in  a  nonprocedural  manner. 


A.  HIERAHCHIAL  DATA  HQUEL 

The  hierarchlal  data  model  Is  a  tree  structure 
organization  which  represents  the  data  as  a  set  of  nested 
one-to-many  (IzV)  and  one-to-one  (HI)  relationships,  t 
one-to-many  association  from  a  record  of  tyre  A  to  a  set 
record  of  type  n  means  that  at  each  period  In  time,  a  aiven 
record  of  A  is  associated  with  zero,  one,  or  a  numper  of 
records  of  b;  This  association  Is  represented  witn  a 


*  '  4 

•  t  ^ 


double-headed  arrow,  A  one-to-one  association,  on  tne  other 
hand,  means  that  for  a  specified  period  of  time#  a  given 
record  of  A  Is  associated  with  one  and  only  one  record  of  B, 
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This  association  is  rapresented  with  a  sinqle-headed  arrow. 
In  implementations,  associations  are  carried  out  ov  recora- 
address  pointers,  we  term  record  type  B  directly  below 
record  type  A, 

Hierarchies,  although  they  are  a  familiar  structures, 
are  very  explicit  in  a  data  model.  If  one  record  tyre  Is  not 
directly  oelow  any  record  type  in  the  hierarchy,  tner.  no 
accesses  to  the  record  type  is  possible.  Multiple  (subtrees) 
subhlerarcnles  are  allowed,  but  there  can  only  be  one 
parent,  that  is  one  root  -  the  apex  of  the  hierarchy,  tioure 
S3  illustrates  the  above  points. 

The  basic  operation  on  a  hlerarchial  database  is  a  tree 
walk  (traversal).  The  search  starts  at  tne  root  and 
continues  to  all  Its  descendants  of  the  olven  recora  tyoe, 
until  the  query  Is  satisfied.  This  model  uses  extensive 
pointers.  These  pointers  could  point  to  a  deoendent 
Child  record,  to  the  next  record,  or  to  the  parent, 
record.  These  links  (pointers)  In  a  hlerarchial  structure 
are  unidirectional  from  parent  to  cMlo  (descendant).  This 
convention  causes  certain  relationship  to  oe  hard  to 
extract  from  the  database,  although  they  may  oe  impiieo 
In  the  data.  This  anomaly  affects  each  of  the  basic  storaqe 
operations.  Insert,  delete,  and  uodate, 

insertions  are  not  possible  without  introducing  a 
Special  dummy  customer  to  Insert  data  concerning  a  new 
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order,  until  the  order  supplies  some  customer  Csee  Flqure 
53).  Deletions  are  corollaries  to  Insertions.  The  only  way 
to  delete  a  shipment  Is  to  delete  an  order,  and  If  the 
only  order  Is  deleted,  all  Information  aependent  on 
that  order  Is  also  lost.  Updating  a  specific  recorn 
presents  tne  problem  of  either  searching  the  entire  model  to 
find  every  occurrence  of  that  change  or  Introduclno  an 
Inconsistency.  For  example,  to  change  tne  city  for  a 
suppller/vendor  to  make  deliveries  for  the  orders,  either 
the  entire  database  Is  searched  for  that  supplier  or  that 
supoller  may  be  shown  In  one  city  at  one  ooint  and  in 
another  city  at  another  point. 

Normally,  nierarchlal  databases  use  the  inverted 
technique  for  Indexing  as  a  way  to  avoid  prolong  traversals. 
In  Figure  53,  the  leaf  (descendant)  product  can  be  jndexe-i 
by  product#,  thereby  allowing  any  record  to  find  its  oarent 
IPef.  60). 

B,  nftwqrk  data  model 

The  network  data  model  represents  data  as  a  set  of 
record  types  and  oalrwise  relationships  between  records  of 
two  record  types.  It  Is  a  more  general  structure  man  a 
hierarchy  because  a  given  record  occurrence  may  have  any 
number  of  Immeolate  parents,  as  well  as  any  numoer  of 
Immediate  dependents.  This  model  is  not  limited  to  just 
one  oarent.  Hence,  It  can  have  many*to-many  reiatlonshios , 


as  In  Figure  54,  which  Is  the  network  version  of  the 
hlerarchlal  model  as  described  In  Figure  53,  The  network 
itodel  also  supports  one-to-many  relationships. 

Although  the  network  data  model  does  not  have  tne 
anomalies  as  In  tne  hlerarchlal  model,  storage 
operations  are  not  as  straightforward  as  exoecteg. 
Insertions  are  simple.  To  Insert  data  concernlno  a  new 
supplier,  a  new  supplier  record  occurrence  is  creates. 
Deletions,  on  the  other  hand,  confront  tne  user  >vitn  tne 
choice  strategy,  that  Is,  to  delete  shipment  associating 
product  with  vendor,  the  problem  Is  that  tnere  are  two 
strategies  for  locating  this  occurrence,  one  that  starts  at 
the  supplier  and  scans  its  chain  looking  for  a  pairwise 
relationship  to  tne  product,  and  one  that  starts  at  tne 
product  and  scans  Its  chain  looking  for  a  pairwis* 
relationship  to  the  supplier,  Tne  chpice  can  be 
significant,  Updatlna  Is  stralghtfcrward. 

Retrieval  with  most  database  systems  of  network 
databases  begin  with  accessing  a  parent  record  via  some 
entry  oolnt  Into  the  database.  Then  the  searcn  continues 
through  the  relevant  database  records  by  getting  tne  first 
or  next  record  In  relationships.  Due  to  Its  complexity. 
Keeping  track  of  where  In  the  database  that  tne  current 
search  Is  taking  place,  is  a  chore  CHef,  611, 

The  purpose  of  this  model  Is  to  convey  wnat  is 
Implemented  In  Che  database.  Many  relationship  types  can  oe 


easily  depleted,  and  both  relationship  type  and  tne  record 
type  are  explicitly  stated. 

C,  THE  RELATIONAL  DATA  MODEL 

A  relational  data  model,  as  Its  name  Implies,  uses  the 
concept  of  SAlatiaaft  to  represent  files,  A  relation  Is  a 
two-dimensional  table,  which  contains  rows  CtiiPles>  that 
correspond  to  records  of  a  flat  file,  A  flat  file  contains 
no  reneatinq  groups,  i,e,,  there  Is  exactly  one  value  at 
every  row  and  column  (attribute)  position  and  never  a  set 
Of  values,  A  table  represents  one  record  type  and  eacn  row 
represents  a  particular  record  of  that  type.  Columns  are 
attributes,  with  all  values  In  a  column  having  the  same 
domain,  which  Is  the  set  of  possible  values  for  an 
attribute.  An  Important  feature  of  a  relational  database 
is  that  associations  between  rows  are  represented 
solely  by  data  values  in  columns  drawn  from  a  common 
domain.  Figure  55  illustrates  the  relational  version  ot 
tne  hlerarchlal  database,  as  described  In  Fioure  53.  In 
this  example,  CUSTOMER,  PRODUCT,  and  VENDOR  are 
basic  relations  that  exist  Independently  from  all  other 
data.  The  ORDER  relation,  can  also  exist  independently, 
but  for  one  of  its  attributes,  customer#,  for  wnien  no 
more  than  one  tuple  may  have  the  same  value.  This  attribute 
implements  the  Orders-f or-Customer  relationship  in  Figure 
53,  i,e,. 


any  value  of  customer#  found  In  an  order  tuole 


logically  should  exist  as  a  customer#  in  some  unloue 

existing  CUSTOMER  tuple.  The  other  relations  <»orK  In  a 
similar  manner  (Ref.  62], 

The  files  In  a  relational  database  may  be  organized  in 
any  of  the  known  file  organization  techniques,  such  as 
heap,  sequential.  Index  sequential,  hash  etc.  As 

for  storage  operations.  Insertions,  deletions,  an-i 
undates  are  all  easily  handled. 


D,  A  SUMMARY 

In  the  hierarchical  and  network  aporoacnes  certain 
relationships  are  represented  by  means  of  llhKs.  Baslcallv 
such  links  are  eaoable  of  representing  one-to-marv 
associations;  the  difference  between  the  two  approaches  is 
that  with  the  latter,  links  may  be  combined  to  model  »ore 
complex  many-to-many  associations,  whereas  this  Is  not 
possible  with  the  latter.  Another  difference,  not 
emphasized  Is  that  links  are  generally  named  In  a  network 
and  anonymous  In  a  hierarchical. 

The  relational  model  organizes  data  Into  tables  of  like 
data  and  supports  intertable  linkages  through  common 
data  occurrences  rather  than  pointers. 

In  short,  the  nlerarcnlal  data  model  is  the  most 
natural,  the  most  familiar  and  best  understood  one  ^hen  it 
Is  used  to  represent  the  engineering  design  databases,  in 
such  a  database,  we  have  records  on  assemblies,  records  on 
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CUSTOMER(CUSTOMER;^.CUSTOMER-ADDRES5, 

CUSTOMER-DETAILS) 

ORDER(ORDER#,CUSTOMER#,ORDER-DATE, 
DELIVERY-DATE.TOTAL-AMOUXT) 
PRODUCT(PRODyCT£,DESCRIPTION',  PRICE. 
QUANTITY-OiVHAXD) 

ORDER-LINE(  ORDER#  .PRODUCT# , 

QUAXTin -ORDERED,  EXTEXDED-PRICE) 

VEXDOR(VENDOR#AnEN'DOR-X'AME.\TXDOR-CI'n') 

SUPPLIES(\^X'DOR#,  PRODUCT#) 


subassemblies,  records  on  components  of  a  subassemoly, 
records  on  parts  of  a  component  and  records  on  Inaivldual 
parts.  These  records  naturally  form  a  hierarchy  of 
engineering  design  database,  a  hierarchlal  database  system 
can  best  be  used  to  access,  manipulate  ana  update  the 
engineering  deslon  databases.  The  networic  model  Is  tne  most 
natural,  offers  f lexlblllllty  for  an  Inventory  control 
aopllcatlon.  In  Inventory  control  a  product  may  nave  manv 
suppliers  and  a  supplier  may  produce  many  products, 
network:  database  system  can  easily  manage,  access  ana  uodate 
these  many-to-many  relationsnips.  The  relational  data  mociei 
is  efficient,  understandable,  for  interactive  use  and  ad  hoc 
Queries,  It  is  a  relative  new  entry  to  database  management, 
rne  current  commercial  database  machines  are  relatlojiai,  for 
example,  the  Oritton-ljee  Intelligent  Database  Machine  and 
the  Teradata  database  computer,  OBC  1012. 


XIV,  fiIEE££eUZIAL  EXLES 


A  failure 

or 

breatc-dowh 

of 

a  database 

can  be 

catastrophic* 

If 

there  is 

not 

any  kind  of 

secouecu 

tec&aXauas  to  recover  the  data  that  has  been  broken  down. 
A  recovery  technl'iue  can  be  used  to  restore  data,  in  suc^ 
a  situation,  to  a  usable  state.  This  Is  accoMolish  ry 
maintaining  recovery  data  to  maice  recovery  possible,  it 
provides  recovery  from  a  failure  which  does  not  atfect  tte 
recovery  data  or  the  mechanisms  used  to  maintain  tne 
recovery  data  and  to  restore  the  states  of  the  data  in  tne 
database.  The  most  popular  recovery  technique  Is  the  use  of 
dXZXacaatXaX  ZXiaa* 

A  differential  file  consists  of  a  relatively  small 
storane  area.  In  which  all  database  alterations  are 
recorded.  It  Is  an  efficient  method  for  storlna  a  large  and 
changing  database.  It  Is  analogous  to  an  errata  list  tor  a 
bootc,  Hatner  than  print  a  new  edition  each  time  that  a 
change  In  text  Is  desired,  a  pupllsher  distributes  an 
errata  sheet  along  with  the  dook  that  identifies 
corrections  In  the  boo<  by  page  and  line  number. 

Under  a  differential  database  representation,  the  main 
files  are  kept  unchanged  until  reorganization*  which  can 
occur  basically  at  any  time,  from  hours  to  months. 


dependent 

on 

usage. 

All 

changes  that 

would  normally 

be 

made  to  a 

main 

file  as 

a 

result  of 

a  transaction 

are 

Instead 

registered 

In 

a  differential  file. 

The 

differential 

file  Is 

always  searched 

first  when  data 

is 

to  be  retrieved.  Data  not  found  In  the  differential  file 
Is  retrieved  from  the  main  database,  Verhofstad  In  [Ref, 
64],  describes  an  efficient  hashing  method  to  Implement 
the  recovery  technique, 

A,  THE  CONSTRUCTION  OF  A  DIFFERENTIAL  FILE 

To  Implement  a  differential  file,  a  small  associative 
memory  In  the  form  of  a  bit  map  accessed  by  a  hashlnq 
scheme  Is  used.  To  reduce  the  probability  of  maNlng  an 
unnecessary  search  in  the  differential  file,  the  database 
system  checKs  the  bit  mao  to  see  whether  the  bits  for  a 
record  are  set  or  not  before  accessing  that  record  (see 
Figure  56),  If  the  bits  are  set  the  record  Is  probably  in 
the  differential  file;  otnerwlse,  the  main  file  would  nave 
to  be  searched.  The  hashing  function  maps  the  record 
address  onto  a  number  of  bits  In  the  bit  map, 

B,  ADVANTAGES  OF  A  DIFFERENTIAL  FILE 

Severance  and  Lohman  In  (Ref,  65],  describe  five 
advantages  of  differential  files*  The  first  three 
advantages  relate  to  the  database  integrity,  i,e,,  the 
correctness  of  data  to  be  recovered.  They  are,  reduction  of 
backup  costs,  speedup  of  recovery,  and  minimization  of 


serious  data  loss.  The  other  two  advantages  are  operational; 
a  differential  file  can  provide  increased  data  availability 
and  simultaneously  reduce  storage  and  retrieval  costs. 

1.  Xba  Eaducblau  ad  BacbuB  Casta 

To  recover  data  from  a  database  that  has  failed,  tne 
status  of  a  previous  state  must  be  reloaded.  The  method 
available  to  generate  previous  states  employs  either  a 
total  dump,  or  an  incremental  dump, 

A  total  dump  of  the  database  taices  place  when  the 
backup  copy  of  the  entire  database  Is  reloaded.  The 
frequency  with  which  the  database  is  cooled  to  Its 
backup  file  Is  dependent  upon  the  database  usage,  when 
It  Is  Impractical  to  dump  the  entire  database.  an 
Incremental  dumping  Is  performed.  in  which  sequential 
sections  of  changes  made  to  a  database  are 
periodically  dumped.  Frequent  dumps  permit  fast  recovery, 
but  are  associated  with  a  higher  system  overhead, 

A  differential  file  can  drastically  reouce  the  cost 
to  backup  a  large  database.  since  the  time  required 
for  a  dump  is  proportional  to  the  volume  of  data  oelng 
copied.  This  Is  particularly  true  when  the  proportion  of 
records  changed  during  a  backup  period  Is  small.  For 
example.  a  total  dump  may  require  up  to  6  hours,  assuming 
that  updates  are  made  5  days  per  week.  10  hours  per  day.  A 
differential  file  on  the  other  hand,  for  the  same  period  of 
time,  could  be  dumped  In  less  than  two  minutes.  Moreover,  a 


differential  file  would  occupy  less  than  one  disk  as 
conpared  to  over  50  for  a  total  dump. 

A  differential  file  also  permits  both  realtime 
dumping  and  reorganization  with  concurrent  uodates. 
Durlno  a  conventional  backup  procedure,  no  updating  is 
possible.  But  by  building  a  "differential-differential 
file",  updating  can  continue.  For  most  applications,  this 
file  will  be  quite  small  and  can  oe  reasonaoly  stored  in 
the  main  memory.  Acting  as  a  cache  store  during  the  dump. 
It  Is  scanned  before  every  retrieval,  when  the  dump  has 
been  completed.  Its  records  Is  Incorporated  into  the  main 
differential  file.  The  same  procedure  would  apply  for 
online  organization. 

2.  X&A  fisaaduo  a£  Sacaicas^ 

The  major  portion  of  recovery  time  for  for  a 
traditional  recovery  method  Is  spent  individually 
reapplying  updates  to  a  small  fraction  of  the  restored 
records.  This  small  subset  of  changed  records  guarantees 
that  even  localized  physical  damage  will  require  a 
lengthy  recovery  procedure,  A  differential  file,  on  the 
other  hand,  by  concentrating  updates  In  a  small  physical 
area  minimizes  the  critical  exoosure  area  of  the 
database.  Most  physical  damage  can  be  quickly  repaired  with 
a  localized  backup-copy  procedure.  Also,  the  critical  area 
can  be  allocated  to  a  more  reliable  device  type  than  is 
practical  for  the  larger  main  file,  and  this  critical  area 


can  be  duplexed  to  provide  the  most  valuable  redundancy  for 
a  marginal  Increase  in  operating  costs*  Moreover*  since 
the  use  of  a  differential  file  can  dramatically  reduce  the 
cost  of  dumping  a  large  database*  an  Inexpensive  dump 
procedure  can  be  Invoiced  frequently  to  reduce  the 
number  of  changes  to  be  remade  In  the  event  of  a  database 
loss. 

3.  &a  lacsaaaa  a£  Qata  Aaailaialll£» 

Traditional  online  updating  requires  complex 
software  to  assure  data  recoverability.  Therefore*  updates 
are  normally  batched  for  end*of-day  processing*  to  minimize 
overhead,  with  a  differential  file*  since  the  main  file 
and  Its  associative  Index  Is  not  affected  by  updates*  a 
less  complex  and  more  efficient  software  procedure  Is 
required*  thereby  enhancing  the  achievement  of  a  greater 
density  of  data  storage.  Neither  free  space  nor  record 
pointers  need  to  be  allocated  for  record  growth.  Moreover* 
the  cost  reduction  that  a  differential  file  provides  will 
greatly  enlarge  the  realm  of  database  systems. 

4 .  Xba  Suaaasv 

A  differential  file  Is  the  most  popular 
representation  of  database  recovery  techniques.  By 
localizing  modifications  In  a  small  storage  area  and 
Physically  Isolating  It  from  the  main  file*  It  Is 
possible  to  realize  seme  Important  benefits  as 
aforementioned.  A  differential  file  Is  conceptually 


XV.  XttE  SUMEABX  QE  ZUfi  ZUESIS 


Computerized  database  applications  have  grown  over  tne 
past  thirty  years  to  a  point  where  they  have  now  become  a 
pervasive  influence  in  our  society.  For  the  past  thirty 
years  the  conventional  magnetic  recording  has,  almost 
exclusively,  fulfilled  the  online  storage  requirements  of 
this  database  applications  community. 

As  the  range  of  applications  has  grown,  a  continuing 
concern  has  been  the  cost  and  access  time  o£  tne  online 
database  storage.  A  wide  range  of  technologies  have  been 
investigated  to  address  this  challenge.  As  rapid  as  the 
progress  In  the  storage  technology  has  been,  the  need  for 
more  capacity  with  faster  access  has  Increased  even 
greater. 

While  the  conventional  magnetic  recording  Is  entering 
yet  another  phase  of  explosive  growth  in  applications  and 
advances  In  technology  In  order  to  meet  these  stringent 
requirements,  tne  optical  dlsXs  have  begun  to  challenge 
the  magnetic  media.  There  are  pressures  to  breax  free 
Of  the  limitations  of  magnetic  storage  where  large  volumes 
of  data  are  involved.  These  pressures  come  from  tne 
continuing  growth  of  conventional  storage,  existing 
requirements  of  large  corporate  and  governmental  databases. 


and  the  development  of  new  applications  such  as  storage  of 
digitized  documents  where  large  volumes  of  data  must  oe 
stored  at  a  lower  cost*  Such  applications  often  demand  a 
cost»  capacity  and  performance  combination  that  is  difficult 
to  achieve  magnetically.  The  optical  storage  Is  able  to 
provide  a  performance  that  Is  competitive  with  the 
performance  of  magnetic  recording.  In  fact#  emerging 
ootical  technologies  are  already  capable  of  replacing 
magnetic  disks  In  certain  applications.  However^  there  Is 
no  single  technology  that  is  right  for  all  applications. 
Whereas  technologies  such  as  RAMs  are  fast  and  technologies 
such  as  magnetic  disks  and  optical  disks  are  Inexpensive,  we 
know  of  nothing  that  is  both  fast  and  inexpensive.  Thus, 
database  Installations  often  have  available  a  wide  range  of 
different  storage  technologies.  The  needs  of  an  application 
must  be  analyzed  to  determine  the  appropriate  technology  to 
use. 

In  this  thesis  we  have  examined  hlgh-voiume,  on-line 
storage  media  of  current  and  emerging  technologies  and 
software  technlgues  for  supporting  these  on-line,  hlgn- 
capaclty  storage  and  access  requirements.  In  the  first 
part,  we  have  analyzed  such  media  as  vertical  magnetic 
recording,  thin  film  media,  optical  data  disks,  magneto¬ 
optic  disks,  bubble  and  Bernoulll-ef feet  disks.  Then, 
comparisons  and  evaluations  of  products  and  oroduct 
categories  have  been  Illustrated,  in  the  second  part,  we 
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haved  rtvlewcd  the  nodtrn  software  techniques  for  on-line 
database  storaqe  and  access.  Me  haved  explored  the 
techniques  for  data  abstraction,  data  access  and  retrieval, 
data  conpactlon.  data  models  for  storage,  and  differential 
files.  There  are  advantages  and  disadvantages  to  all 
technologies  and  techniques.  The  Individual  application  of 
the  organization  must  be  used  to  dictate  the  specific 
requirements,  along  with  Its  financial  constraints,  with 
these  requirements,  the  organization  can  then  take  the 
advantage  of  certain  strong  points  of  hardware  technologies 
and  the  software  techniques  and  utilize  them  in  meeting  tne 
requirements.  This  thesis  has  provided  a  comprehensive  and 
up-to-date  analysis  of  the  strong  points  and  weak  points  of 
the  hardware  technologies  and  software  techniques  which  in 
turn  make  It  easier  for  an  organization  in  meeting  its 
requirements,  for  online  storage  and  access. 
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